Opened 14 years ago
Closed 13 years ago
#5642 closed enhancement (fixed)
[patch] add checks for duplicate relations, same relations and same ways to validator
Reported by: | bilbo | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core | Version: | latest |
Keywords: | validator | Cc: | skyper |
Description (last modified by )
This patch adds three checks to validator:
Same ways: similar to duplicate ways (ways having same nodes), but regardless of tags. Severity warning.
It is quite common in some places to find duplicated ways, but with one of the copies having slightly different tags (like one being residential, one having name, while the second not, etc ...). There is no automatic fix, user have to solve this manually (although usually the proper fix is just to delete one of the ways).
Duplicate relations: 2 or more relations having exactly same tags and exactly same members in exactly same order and same roles - these duplicates usually come from botched import uploads. Severity error.
Automatic fix is to delete all relations except the one with lowest ID (similarly like it is done with duplicated ways)
Same relations: same as above, but with severity warning and checking only members and their roles, not tags. And no automatic fix for this one (though in the map usually these duplicates differ in relatively uninteresting tags like source, etc ...).
For duplicated relation check, nodes are duplicates if they have same tags and position, regardless of their id's. Same is for ways - ways are same if they have same array of nodes (with same positions) and tags
Attachments (3)
Change History (18)
by , 14 years ago
Attachment: | validator-sw-dr-check-ni.patch added |
---|
comment:1 by , 14 years ago
Now for a single duplicate Way you get "Duplicated Ways" error, "Overlapping Ways" warning and "Ways with same position" warning.
In this case the error ("Duplicated Ways") would be enough, but I'm not 100% sure how to do that.
by , 14 years ago
Attachment: | validator-sw-dr-check-ni_(updated).patch added |
---|
validator is in JOSM core now - here is the updated patch
comment:2 by , 14 years ago
Well, overlapping ways shows only if ways have same nodes. If the duplicate ways have also duplicate nodes with same position (i.e. usual case if somebody accidentally uploads new way twice), it is detected only as same & duplicate way, but not as overlapping way.
However, it can be fixed by reporting same way only if tags differ (and if they are same, report as duplicate way)
comment:4 by , 14 years ago
Maybe we can show an error whenever the tags differ only by uninteresting keys? (OsmPrimitive.getUninterestingKeys())
And offer a fix whenever one way's tag set is superset of all other ways' tag sets.
comment:5 by , 14 years ago
But some of the uninteresting keys (those that are considered to be not enough to consider way tagged) can be actually interesting if the way is already tagged. For example something like "fixme=bad position, needs review" may be considered interesting.
So perhaps we should make difference between "junk" tags like "created_by", "converted_by" or "watch" that does not contain any information useful to mappers or data users and those that usually contain some possibly important information (note, fixme, ...). I am not sure where to put source/attribution tags in this ....
comment:6 by , 14 years ago
Maybe like this:
First get rid of "junk" tags - no one cares about these. Then, if there is a way A with
highway=primary maxspeed=30 source=yahoo fixme=really primary?
way B with
highway=primary source=yahoo
way C
highway=primary maxspeed=30
and way D (untagged) then way B, C and D can probably be removed as all information is already in way A.
comment:7 by , 14 years ago
Type: | defect → enhancement |
---|
comment:8 by , 14 years ago
Component: | Core validator → Core |
---|---|
Keywords: | validator added |
Please also add a check and warning if there exists relations of the same type and with the same name but different members.
Thanks skyper
comment:10 by , 14 years ago
Summary: | [patch] add checks for duplicate relations, same relations and same ways to validator → [patch needs rework] add checks for duplicate relations, same relations and same ways to validator |
---|
There are still open issues. Any progress?
comment:11 by , 14 years ago
Cc: | added |
---|
follow-up: 13 comment:12 by , 14 years ago
I've started to rework the patch, I've joined the same way test as subclass of the duplicate way tast and also same way reports only if there is at least one difference between tags.
I've thought of some further improvements:
1) add some more tags (user configurable?) in addition to created_by - for example, if there are two same ways, differing just by source, one of them should be usually removed.
2) "check and warning if there exists relations of the same type and with the same name but different members."
-> this could be useful, but I am not sure if this won't bring many false positives. Perhaps limit this check to only some types of relations?
I also plan to add check for relations with same tags and members which are ways with same positions (to detect cases when someone uploads bunch of new ways, then uploads a relation putting them into a multipoly, then uploads bunch of (same) new ways and new relation with those new ways)
bastiK> Deciding which tags are junk may be untrivial as would be fiunding out some "optimal superset" - as this may add "junk" tags (fixme, converted_by, ...) that were removed from some of the versions among the duplicates. But perhaps I could add some GUI that will show all the duplicated way's tags and offer some solution (for N ways N buttons "keep only this one" and one button to "keep superset of all values") - what do you think of this idea?
comment:13 by , 14 years ago
Replying to bilbo:
I've started to rework the patch, I've joined the same way test as subclass of the duplicate way tast and also same way reports only if there is at least one difference between tags.
I've thought of some further improvements:
1) add some more tags (user configurable?) in addition to created_by - for example, if there are two same ways, differing just by source, one of them should be usually removed.
2) "check and warning if there exists relations of the same type and with the same name but different members."
-> this could be useful, but I am not sure if this won't bring many false positives. Perhaps limit this check to only some types of relations?
I also plan to add check for relations with same tags and members which are ways with same positions (to detect cases when someone uploads bunch of new ways, then uploads a relation putting them into a multipoly, then uploads bunch of (same) new ways and new relation with those new ways)
sounds good
bastiK> Deciding which tags are junk may be untrivial as would be fiunding out some "optimal superset" - as this may add "junk" tags (fixme, converted_by, ...) that were removed from some of the versions among the duplicates. But perhaps I could add some GUI that will show all the duplicated way's tags and offer some solution (for N ways N buttons "keep only this one" and one button to "keep superset of all values") - what do you think of this idea?
I wouldn't call fixme
or source
a junk tag. These are "unphysical" but can be important nevertheless. About the dialog: Sometimes there are a few thousand pairs of duplicate ways. It should be possible to fix these all at once.
Imho such a resolution gui dialog is an overkill here. The "superset" strategy should cover most cases that are relevant in practice. The remaining case is where there are duplicate objects and both "versions" have been tagged differently (with non-junk tags). We can let the user resolve the mess manually in this rare case.
If you like to write this dialog anyway, go ahead, but take care that it supports "batch fixing".
comment:14 by , 14 years ago
Description: | modified (diff) |
---|---|
Summary: | [patch needs rework] add checks for duplicate relations, same relations and same ways to validator → [patch] add checks for duplicate relations, same relations and same ways to validator |
Attaching fixed patch.
For duplicated relation check, when cheking members, nodes are duplicates if they have same tags and position, regardless of their id's. Same is for ways - ways are same if they have same array of nodes (with same positions) and tags.
This way the duplicated relations resulting from interrupted and retried uploads should be always detected, regardless of at which stage the dupplication occur (if it is just duplicate relation, or there are some duplicated ways (or even nodes) that are duplicated too and that can be easily fixed afterwards), entire tree of objects (nodes and ways in relations and for ways also it's nodes) is examined (note that relations are still compared only by id's, as 1) relations inside relations are rare 2) there is risk of entering an infinite cycle)
I decided to do no automatic fixing for same ways with different tags, as the decision how to fix is often different (sometimes it is to delete first or second way, sometimes merging tags, sometimes they ways are best kept as is.) and I can't think of reliable automated algorithm that would not be insanely complex and will decide what to do.
Patch to add checks for duplicated/same relations and same ways