Modify

Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#2245 closed enhancement (fixed)

Speed up merge operation for large datasets (> 50k)

Reported by: balrog-kun Owned by: dmuecke
Priority: major Milestone:
Component: Core Version: latest
Keywords: merge Cc: Gubaer

Description

Merging two layers where there have been nodes in the exact same places results in broken dataset. It seems to me, but I'm not sure, that the "action" tag isn't updated and always takes the value from the first layer of the two.

So for example if you work with two layers and there's a way that is in both layers, if you delete the way from the first layer, so as to avoid breakage, then after the merge the nodes of the way will all be deleted (and the way itself, not necesarily, resulting in an undeleted way between deleted nodes, which looks unusually in JOSM). Very dangerous for big merges :-(

(Same if the nodes on the first layer have not been modified, and the nodes on the second layer had a tag added... the results will have the new tags, but will be marked unmodified)

Attachments (0)

Change History (15)

comment:1 Changed 12 years ago by dmuecke

Owner: changed from team to dmuecke
Status: newassigned

comment:2 Changed 12 years ago by Gubaer

Cc: Gubaer added

Merge logic has been reworked recently, see MergeVisitor.

There is also a UnitTest class for different merge cases, see MergeVisitorTest.

comment:3 in reply to:  2 Changed 12 years ago by anonymous

Replying to Gubaer:

Merge logic has been reworked recently, see MergeVisitor.

There is also a UnitTest class for different merge cases, see MergeVisitorTest.

Class org.openstreetmap.josm.io.MultiFetchServerObjectReaderTest not part of repository. Could you add it?

comment:4 Changed 12 years ago by Gubaer

Cc: Gubaer removed

There was an unnecessary reference to org.openstreetmap.josm.io.MultiFetchServerObjectReaderTest in MergeVisitorTest. I've removed it but I've committed MultiFetchServerObjectReaderTest anyway.

See r1703

comment:5 Changed 12 years ago by dmuecke

For all reported issues I have found a test case so my guess is that your problems doesn't longer exist. Can you confirm otherwise let me know how to reproduce it!

comment:6 Changed 12 years ago by dmuecke

I would consider this issue as fixed and close it.

comment:7 Changed 12 years ago by balrog-kun

Seems to have improved but still far from fixed.

I just tested the following six scenarios in JOSM r1713:

Scenario 1:
test0.osm contains node id=1, node id=2, way id=3 containing both nodes, all objects have action=modify
test1.osm contains node id=1, action=delete

Merging test0 + test1 gives "There were 1 conflicts during import". I resolve the import by choosing delete node id=1 and I get a corrupt dataset with way id=3 containing a deleted node.

Scenario 2:
Same as above but now test1 is the first layer. Merging test1 + test0 gives the corrupt dataset right away.

Scenario 3:
Same as scenario 1 but now the node in test1.osm has id=-1. Merging is OK (previously it would get the corrupt dataset again).

Scenario 4:
Same as scenario 3 but test1 is the first layer. Merging is OK.

Scenario 5:
Same as scenario 3 but now the node in test1.osm has action=modify. Merging gives duplicate node (previously it would merge the two nodes into one because their coordinates are identical).

Scenario 6:
Same as scenario 1 but now the node in test0.osm has id=-1. Merging is OK (previously it would give corrupt dataset).

comment:8 in reply to:  7 Changed 12 years ago by anonymous

Replying to balrogg@…:

Scenario 1:
test0.osm contains node id=1, node id=2, way id=3 containing both nodes, all objects have action=modify
test1.osm contains node id=1, action=delete

Err, actually this scenario is okay, after resolving the conflict to deleted, the way gets deleted too (I tested this one with some last week's revision)

comment:9 Changed 12 years ago by anonymous

thanks, good test cases. I'll integrate them in the Unit tests for the merge logic.

And, of course, I'll try to fix the defects :-)

comment:10 in reply to:  7 ; Changed 12 years ago by Gubaer

Replying to balrogg@…:

Scenario 5:
Same as scenario 3 but now the node in test1.osm has action=modify. Merging gives duplicate node (previously it would merge the two nodes into one because their coordinates are identical).

This is a feature, not a bug. In both cases, Scenario 3 and Scenario 5 you should end up with a dataset which includes both a node id=1 and a node id=0 at the same position. The node with id=1 is already known on the server who assigned it its id. The node id=-1 is a new node, because it has negative id. Internally, JOSM assigns all these nodes the id 0. A node with id=0 should not be merged onto a node with id>0, but two nodes with id=0 and equal coordinates can be merged.

comment:11 Changed 12 years ago by Gubaer

Resolution: fixed
Status: assignedclosed

fixed in r1753

comment:12 Changed 12 years ago by Gubaer

Cc: Gubaer added

comment:13 in reply to:  10 ; Changed 12 years ago by balrog-kun

Replying to Gubaer:

Replying to balrogg@…:

Scenario 5:
Same as scenario 3 but now the node in test1.osm has action=modify. Merging gives duplicate node (previously it would merge the two nodes into one because their coordinates are identical).

This is a feature, not a bug. In both cases, Scenario 3 and Scenario 5 you should end up with a dataset which includes both a node id=1 and a node id=0 at the same position. The node with id=1 is already known on the server who assigned it its id. The node id=-1 is a new node, because it has negative id. Internally, JOSM assigns all these nodes the id 0. A node with id=0 should not be merged onto a node with id>0, but two nodes with id=0 and equal coordinates can be merged.

Ok, I have two comments then:

  • Since this is a change from the behaviour of older JOSM versions (where it would merge nodes that had the same position), it seems there's no more any excuse for the merge operation to take quadratic time relative to number of nodes (for two 50k nodes layers, that means practically the operation will never finish), but it takes that much time for me :-(
  • Shouldn't nodes with id=0 and id=1 be merged? I understand why nodes id=1 and id=2 should stay separate but for a new node and existing node perhaps it would be useful for them to get merged? (My use case is: I have data to import for a square area that has been split into several files by cutting all lines on given latitude or longitude. I upload one piece. Then I load another piece into JOSM and download data on the boundary of the piece and have to manually merge the nodes on the boundary)

comment:14 in reply to:  13 ; Changed 12 years ago by Gubaer

Priority: criticalmajor
Summary: Merge layers doesn't update the "action" tagsSpeed up merge operation for large datasets (> 50k)
Type: defectenhancement
  • Since this is a change from the behaviour of older JOSM versions (where it would merge nodes that had the same position), it seems there's no more any excuse for the merge operation to take quadratic time relative to number of nodes (for two 50k nodes layers, that means practically the operation will never finish), but it takes that much time for me :-(

Good point. We should have a look at that. I'll reopen the issue and change the summary and I'll change it into an enhancement.

  • Shouldn't nodes with id=0 and id=1 be merged? I understand why nodes id=1 and id=2 should stay separate but for a new node and existing node perhaps it would be useful for them to get merged? (My use case is: I have data to import for a square area that has been split into several files by cutting all lines on given latitude or longitude. I upload one piece. Then I load another piece into JOSM and download data on the boundary of the piece and have to manually merge the nodes on the boundary)

In my opinion: no, they shouldn't get merged because they are simply "too different".

comment:15 in reply to:  14 Changed 12 years ago by anonymous

Replying to Gubaer:

  • Since this is a change from the behaviour of older JOSM versions (where it would merge nodes that had the same position), it seems there's no more any excuse for the merge operation to take quadratic time relative to number of nodes (for two 50k nodes layers, that means practically the operation will never finish), but it takes that much time for me :-(

Good point. We should have a look at that. I'll reopen the issue and change the summary and I'll change it into an enhancement.

Thanks, makes sense.

One more nit pick is, I noticed the other scenarios from comment:7 now work correctly except in scenario 1 after resolving the conflict, undo doesn't exactly undo the operation.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain dmuecke.
as The resolution will be set.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.