Modify

Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#8643 closed defect (fixed)

[patch] Very slow Purge command - O(N^2)

Reported by: bilbo Owned by: team
Priority: normal Milestone:
Component: Core Version: latest
Keywords: patch Cc:

Description (last modified by bilbo)

There is bad algorithm used in Purge command, with complexity O(N2), causing purge with about 176000 ways to take over 15 minutes.

Problem is in toposort, where the algorithm restart iteration over the hashset after every deleted way, but in Javaadoc for HashMap it says:

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings).

As the capacity still remains the same as the collection becomes more and more empty, this effectively is O(N2) algorithm, as it takes longer and longer time to iterate to first way (not mentioning skipping all the nodes again and again before encountering a corresponding way).

The patch fixes the topopSort and reduces the time it needs to process the data

For purging collection of about 176000 ways and 800000 nodes the time for toposort is reduced from about 17 minutes to 500 milliseconds.

Attachments (2)

purgemod2.diff (2.4 KB ) - added by bilbo 12 years ago.
Much faster Purge command
purgemod3.diff (3.8 KB ) - added by bilbo 12 years ago.
Patch that fixes both O(N2) loops

Download all attachments as: .zip

Change History (7)

by bilbo, 12 years ago

Attachment: purgemod2.diff added

Much faster Purge command

comment:1 by bilbo, 12 years ago

Description: modified (diff)

Note: there seems to be one still some O(n2) part in the Purge command, though not that bad one:

The while loop, with comment:

 // Add referrer, unless the object to purge is not new
 // and the parent is a relation

When purging 176000 ways: needs 20 secs
When purging 550000 ways: needs 260 secs
(in both cases the purged ways are about 99% of the loaded dataset, i.e. not much is left afterwards

and the characteristics of the data is the same - OSM extracts with only buildings in them)

Now I see the reason for this is the same (iteration restarted after every removal, so effectively O(n2) part too), I guess I'll make another patch to fix this one too.

by bilbo, 12 years ago

Attachment: purgemod3.diff added

Patch that fixes both O(N2) loops

comment:2 by bilbo, 12 years ago

Attached patch purgemod3.diff​ that fixes both of the O(N2) loops, making them much faster O(N)

Time to execute the second loop for 70000 ways and 470000 nodes decreased from 233 seconds to 150 milliseconds with the patch.

So with the patch, Purge action is finally usable even with large datasets (millions of primitives).

comment:3 by bastiK, 12 years ago

Resolution: fixed
Status: newclosed

In 5905/josm:

applied #8643 - Very slow Purge command - O(N2) (patch by bilbo)

comment:4 by bastiK, 12 years ago

Thanks for the patch, great improvement!

comment:5 by Don-vip, 12 years ago

Wow, this is awesome ! Thanks a lot !

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.