#20425 closed defect (fixed)
[Patch] Duplicate Relation test is too lazy/too aggressive
Reported by: | GerdP | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | 24.02 |
Component: | Core validator | Version: | |
Keywords: | Cc: |
Description
Follow up of #20393, #20424 and #19956
The test that produces message "Duplicated relations" is a quite misleading. With the current implementation (r17476)
- it ignores all tag keys returned by
AbstractPrimitive.getUninterestingKeys()
(e.g.description, note, fixme, comment
) - it ignores members which appear multiple times (eg. two times in one relation, only once in the other)
- it ignores the order of members (wich is good for multipolygons but not for route relations)
It's too lazy because
- it will flag two multipolygon relations with the same tags and the same geometry if they use different ways, but only when those different ways have the same nodes in the same order and the same tags, so I am not sure if the test really tries to find the equal geometries.
- it ignores relations with identical member lists when any of the members isn't downloaded (#20242)
The major problem: It offers an autofix which also ignores these differences and might remove the "better" version of a relation, e.g. it might keep the one with the fixme tag.
I wonder if I should try to improve the code or better simply remove the autofix or the complete code.
Attachments (4)
Change History (12)
comment:1 by , 4 years ago
comment:2 by , 15 months ago
Looking at this again. According to #5642 the test was written to find duplicated multipolygons in imports. My thoughts:
- The test should be changed to produce only one error for a given
pairset of relations, either code 1901 "Duplicated relations" or code 1902 "Relations with same members" or maybe another new message, see below. Edit: This can be tricky, so I consider it less important. - The test that produces the informational message "Relations with same members" should ignore the order of the members
and also the number of occurences. It should only compare roles and member types and unique ids and not there geometry. This allows to compare incomplete relations. - Relations with different tags but identical member lists should have a different text, maybe "Relations with identical member lists". In this case order and number of occurences of members should matter.
- A message "Duplicated relations" requires knowledge about the meaning of the order of the members. If the order matters it should be compared, else ignored, but different numbers of the same member should never be ignored. I think tags like source=* or fixme=* should not be ignored here, but those which we call "discardable" can be ignored.
- The current test tries to compare the geometry, this requires a pair of complete relations. It seems reasonable to ignore the "uninteresting" tags like source in this case and also to offer an autofix which removes the newer relation. I am just not happy with the message "Duplicated relations". I'd prefer to have something like "Relations with equal meaning describe identical geometry". I wonder if this geometry test should better be done in
MultipolygonTest
? - Regarding autofix: Besides the special case above I think only two relations with identical list of members and equals tags can be autofixed by removing the newer relation. Or do we have to ignore "discardable" tags here?
by , 15 months ago
Attachment: | dup-bus-route.osm added |
---|
two bus routes, same tags, different order of members
follow-up: 4 comment:3 by , 15 months ago
@skyper: Please try attached dup-bus-route.osm. Current code in tested r18969 reports a duplicate relation although the order of the members is different (reversed). I wonder how to improve this. I think two (or more) route relations with the same tags are more or less always problematic unless they appear in completely different areas?
Do we have other relation types where the order of the members does matter?
comment:4 by , 15 months ago
Replying to GerdP:
I wonder how to improve this.
Maybe add more detailed warnings when the order matters:
- identical (same tags and same order)
- duplicate (same tags but different order)
- similar (different tags but same members in same order)
- same members (same members but different tags and order)?
I think two (or more) route relations with the same tags are more or less always problematic unless they appear in completely different areas?
Yes, but the tags could be incomplete, e.g. most route should have an operator and a ref (symbol or name) which should make the difference in most cases. You example is the same. It could be two incompletely mapped relations (missing the stops) going in opposite directions. Theoretically you could have a two relations completely mapped using only stop_position as "stop" and same tags but even than they should have at least one tag describing the difference.
Currently, we do not count all tags as description
, FIXME
, note
plus some more are ignored which needs to be thought about, again, and which could lead to more categories above if only "uninteresting" tags are different.
Do we have other relation types where the order of the members does matter?
From current presets in core there is additionally only type=waterway
. Though, I have to say that I lost track of all used/proposed relations.
by , 15 months ago
Attachment: | 20425.patch added |
---|
comment:5 by , 15 months ago
Summary: | Duplicate Relation test is too lazy/too aggressive → [Patch] Duplicate Relation test is too lazy/too aggressive |
---|
comment:6 by , 15 months ago
Ths patch has a performance issue with large numbers of relations. So work in progress...
The patch was simply wrong so that super.endTest() was not called. That means the progress bar is not updated while further tests are running.
I keep forgetting that I should not use if ... return;
in endTest()
Fixed with 20425.3.patch
by , 15 months ago
Attachment: | 20425.3.patch added |
---|
comment:8 by , 15 months ago
Milestone: | → 24.02 |
---|
In order to delete one of the relations you need to take a look at the history of both. Please, remove the autofix.