Modify

Opened 14 years ago

Closed 13 years ago

#5642 closed enhancement (fixed)

[patch] add checks for duplicate relations, same relations and same ways to validator

Reported by: bilbo Owned by: team
Priority: normal Milestone:
Component: Core Version: latest
Keywords: validator Cc: skyper

Description (last modified by bilbo)

This patch adds three checks to validator:

Same ways: similar to duplicate ways (ways having same nodes), but regardless of tags. Severity warning.

It is quite common in some places to find duplicated ways, but with one of the copies having slightly different tags (like one being residential, one having name, while the second not, etc ...). There is no automatic fix, user have to solve this manually (although usually the proper fix is just to delete one of the ways).

Duplicate relations: 2 or more relations having exactly same tags and exactly same members in exactly same order and same roles - these duplicates usually come from botched import uploads. Severity error.
Automatic fix is to delete all relations except the one with lowest ID (similarly like it is done with duplicated ways)

Same relations: same as above, but with severity warning and checking only members and their roles, not tags. And no automatic fix for this one (though in the map usually these duplicates differ in relatively uninteresting tags like source, etc ...).

For duplicated relation check, nodes are duplicates if they have same tags and position, regardless of their id's. Same is for ways - ways are same if they have same array of nodes (with same positions) and tags

Attachments (3)

validator-sw-dr-check-ni.patch (12.9 KB ) - added by bilbo 14 years ago.
Patch to add checks for duplicated/same relations and same ways
validator-sw-dr-check-ni_(updated).patch (13.4 KB ) - added by bastiK 14 years ago.
validator is in JOSM core now - here is the updated patch
validator-sw-dr-m4.patch (15.6 KB ) - added by bilbo 13 years ago.
Finished version

Download all attachments as: .zip

Change History (18)

by bilbo, 14 years ago

Patch to add checks for duplicated/same relations and same ways

comment:1 by bastiK, 14 years ago

Now for a single duplicate Way you get "Duplicated Ways" error, "Overlapping Ways" warning and "Ways with same position" warning.

In this case the error ("Duplicated Ways") would be enough, but I'm not 100% sure how to do that.

by bastiK, 14 years ago

validator is in JOSM core now - here is the updated patch

comment:2 by bilbo, 14 years ago

Well, overlapping ways shows only if ways have same nodes. If the duplicate ways have also duplicate nodes with same position (i.e. usual case if somebody accidentally uploads new way twice), it is detected only as same & duplicate way, but not as overlapping way.

However, it can be fixed by reporting same way only if tags differ (and if they are same, report as duplicate way)

comment:3 by bastiK, 14 years ago

sounds good

comment:4 by bastiK, 14 years ago

Maybe we can show an error whenever the tags differ only by uninteresting keys? (OsmPrimitive.getUninterestingKeys())

Version 0, edited 14 years ago by bastiK (next)

comment:5 by bilbo, 14 years ago

But some of the uninteresting keys (those that are considered to be not enough to consider way tagged) can be actually interesting if the way is already tagged. For example something like "fixme=bad position, needs review" may be considered interesting.

So perhaps we should make difference between "junk" tags like "created_by", "converted_by" or "watch" that does not contain any information useful to mappers or data users and those that usually contain some possibly important information (note, fixme, ...). I am not sure where to put source/attribution tags in this ....

comment:6 by bastiK, 14 years ago

Maybe like this:

First get rid of "junk" tags - no one cares about these. Then, if there is a way A with

highway=primary
maxspeed=30
source=yahoo
fixme=really primary?

way B with

highway=primary
source=yahoo

way C

highway=primary
maxspeed=30

and way D (untagged) then way B, C and D can probably be removed as all information is already in way A.

comment:7 by bastiK, 14 years ago

Type: defectenhancement

comment:8 by anonymous, 14 years ago

Component: Core validatorCore
Keywords: validator added

Please also add a check and warning if there exists relations of the same type and with the same name but different members.
Thanks skyper

comment:9 by anonymous, 14 years ago

Ticket #4745 has been marked as a duplicate of this ticket.

comment:10 by stoecker, 14 years ago

Summary: [patch] add checks for duplicate relations, same relations and same ways to validator[patch needs rework] add checks for duplicate relations, same relations and same ways to validator

There are still open issues. Any progress?

comment:11 by skyper, 14 years ago

Cc: skyper added

comment:12 by bilbo, 14 years ago

I've started to rework the patch, I've joined the same way test as subclass of the duplicate way tast and also same way reports only if there is at least one difference between tags.

I've thought of some further improvements:

1) add some more tags (user configurable?) in addition to created_by - for example, if there are two same ways, differing just by source, one of them should be usually removed.

2) "check and warning if there exists relations of the same type and with the same name but different members."

-> this could be useful, but I am not sure if this won't bring many false positives. Perhaps limit this check to only some types of relations?

I also plan to add check for relations with same tags and members which are ways with same positions (to detect cases when someone uploads bunch of new ways, then uploads a relation putting them into a multipoly, then uploads bunch of (same) new ways and new relation with those new ways)

bastiK> Deciding which tags are junk may be untrivial as would be fiunding out some "optimal superset" - as this may add "junk" tags (fixme, converted_by, ...) that were removed from some of the versions among the duplicates. But perhaps I could add some GUI that will show all the duplicated way's tags and offer some solution (for N ways N buttons "keep only this one" and one button to "keep superset of all values") - what do you think of this idea?

in reply to:  12 comment:13 by bastiK, 14 years ago

Replying to bilbo:

I've started to rework the patch, I've joined the same way test as subclass of the duplicate way tast and also same way reports only if there is at least one difference between tags.

I've thought of some further improvements:

1) add some more tags (user configurable?) in addition to created_by - for example, if there are two same ways, differing just by source, one of them should be usually removed.

2) "check and warning if there exists relations of the same type and with the same name but different members."

-> this could be useful, but I am not sure if this won't bring many false positives. Perhaps limit this check to only some types of relations?

I also plan to add check for relations with same tags and members which are ways with same positions (to detect cases when someone uploads bunch of new ways, then uploads a relation putting them into a multipoly, then uploads bunch of (same) new ways and new relation with those new ways)


sounds good

bastiK> Deciding which tags are junk may be untrivial as would be fiunding out some "optimal superset" - as this may add "junk" tags (fixme, converted_by, ...) that were removed from some of the versions among the duplicates. But perhaps I could add some GUI that will show all the duplicated way's tags and offer some solution (for N ways N buttons "keep only this one" and one button to "keep superset of all values") - what do you think of this idea?

I wouldn't call fixme or source a junk tag. These are "unphysical" but can be important nevertheless. About the dialog: Sometimes there are a few thousand pairs of duplicate ways. It should be possible to fix these all at once.

Imho such a resolution gui dialog is an overkill here. The "superset" strategy should cover most cases that are relevant in practice. The remaining case is where there are duplicate objects and both "versions" have been tagged differently (with non-junk tags). We can let the user resolve the mess manually in this rare case.

If you like to write this dialog anyway, go ahead, but take care that it supports "batch fixing".

by bilbo, 13 years ago

Attachment: validator-sw-dr-m4.patch added

Finished version

comment:14 by bilbo, 13 years ago

Description: modified (diff)
Summary: [patch needs rework] add checks for duplicate relations, same relations and same ways to validator[patch] add checks for duplicate relations, same relations and same ways to validator

Attaching fixed patch.

For duplicated relation check, when cheking members, nodes are duplicates if they have same tags and position, regardless of their id's. Same is for ways - ways are same if they have same array of nodes (with same positions) and tags.

This way the duplicated relations resulting from interrupted and retried uploads should be always detected, regardless of at which stage the dupplication occur (if it is just duplicate relation, or there are some duplicated ways (or even nodes) that are duplicated too and that can be easily fixed afterwards), entire tree of objects (nodes and ways in relations and for ways also it's nodes) is examined (note that relations are still compared only by id's, as 1) relations inside relations are rare 2) there is risk of entering an infinite cycle)

I decided to do no automatic fixing for same ways with different tags, as the decision how to fix is often different (sometimes it is to delete first or second way, sometimes merging tags, sometimes they ways are best kept as is.) and I can't think of reliable automated algorithm that would not be insanely complex and will decide what to do.

comment:15 by stoecker, 13 years ago

Resolution: fixed
Status: newclosed

In [4051/josm]:

fix #5642 - patch by bilbo - improve validator checks

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.