Opened 12 years ago
Closed 12 years ago
#7915 closed enhancement (fixed)
[PATCH] Automatically discard some TIGER tags on upload
Reported by: | ToeBee | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core | Version: | latest |
Keywords: | Cc: | ToeBee |
Description
There are some tags that were uploaded in the original TIGER import which are now entirely useless and just taking up space and making the tag list harder to manage as we add more and more real world tags to ways. I am proposing to extending JOSM's silent delete feature for the created_by tag to include these TIGER tags so that as people edit these objects, the tags are dropped.
In particular, these tags:
tiger:upload_uuid - This tag is a hash that was used to group uploads. It was useful during the import process (essentially it was a changeset identifier for API 0.5 which lacked changesets) but now accounts for about 1 GB in the planet file and has absolutely no value to anyone. Since it is a random string, it also degrades compressibility of the file so it's a double hit. It has already been removed from a lot of ways when the name expansion bot was run on them.
tiger:tlid - This is a foreign key to the original TIGER data. In theory it could be used to synchronize with future TIGER data sets or otherwise do some cross-dataset analysis. However the TIGER data model has changed since the import and this field no longer exists. Also, as ways have been split and combined, the value in this tag has been mangled. Sometimes to the point where the tag value exceeds API length limits at which point the user must either truncate or delete the tag anyway.
tiger:source - again, a key that had some potentially useful information at import time. But especially after a user has edited the way, this tag becomes unimportant and crufty.
tiger:separated - In theory, a tag that indicates if a road is dual carriageway. In practice, it is wrong a majority of the time and it was put on all ways from residential up. So on 95% of the ways it is completely uninteresting data anyway.
In addition, I need to wear more tin foil around my brain because the Canadians caught wind of my plans. They came up with two tags to add to the list from their imports:
geobase:datasetName and geobase:uuid
I am supplying a patch. I implemented this similarly to the "uninteresting" tags with a list in OsmPrimitive and then I check from OsmWriter whether a given key is in the list instead of only hard-coding on "created_by" as was being done before.
Links to the relevant mailing list threads where this has been discussed among the affected communities:
US: http://lists.openstreetmap.org/pipermail/talk-us/2012-July/008830.html
Canada: http://lists.openstreetmap.org/pipermail/talk-ca/2012-July/004948.html
Now to see if I can make a usable patch file... I've been spoiled by pull requests :)
Attachments (1)
Change History (9)
follow-up: 3 comment:1 by , 12 years ago
comment:2 by , 12 years ago
I've never had a conflict because of this. If I redownload the area after uploading, the tags just silently vanish.
comment:3 by , 12 years ago
comment:4 by , 12 years ago
As stoecker said, the change should be reflected in the local dataset. We could add a command to the undo stack that removes the unwanted tags on modified primitives right before upload (or something like that).
by , 12 years ago
Attachment: | discard_tags.patch added |
---|
Alright let's try this on for size. Implemented as an upload hook instead.
comment:5 by , 12 years ago
Summary: | Automatically discard some TIGER tags on upload → [PATCH] Automatically discard some TIGER tags on upload |
---|
comment:6 by , 12 years ago
Much better. Adding odbl and odbl:note it should be ready for applying. bastiK?
"odbl" can also be dropped (see #7906).
If we update this for more tags, then it should be done properly. Problem now is that JOSM does not know that it dropped a tag, so there is potential for conflicts here. The tags should be dropped from JOSM's dataset as well (i.e. on OK from server-upload).