#8643 closed defect (fixed)
[patch] Very slow Purge command - O(N^2)
Reported by: | bilbo | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core | Version: | latest |
Keywords: | patch | Cc: |
Description (last modified by )
There is bad algorithm used in Purge command, with complexity O(N2), causing purge with about 176000 ways to take over 15 minutes.
Problem is in toposort, where the algorithm restart iteration over the hashset after every deleted way, but in Javaadoc for HashMap it says:
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings).
As the capacity still remains the same as the collection becomes more and more empty, this effectively is O(N2) algorithm, as it takes longer and longer time to iterate to first way (not mentioning skipping all the nodes again and again before encountering a corresponding way).
The patch fixes the topopSort and reduces the time it needs to process the data
For purging collection of about 176000 ways and 800000 nodes the time for toposort is reduced from about 17 minutes to 500 milliseconds.
Attachments (2)
Change History (7)
by , 12 years ago
Attachment: | purgemod2.diff added |
---|
comment:1 by , 12 years ago
Description: | modified (diff) |
---|
Note: there seems to be one still some O(n2) part in the Purge command, though not that bad one:
The while loop, with comment:
// Add referrer, unless the object to purge is not new // and the parent is a relation
When purging 176000 ways: needs 20 secs
When purging 550000 ways: needs 260 secs
(in both cases the purged ways are about 99% of the loaded dataset, i.e. not much is left afterwards
and the characteristics of the data is the same - OSM extracts with only buildings in them)
Now I see the reason for this is the same (iteration restarted after every removal, so effectively O(n2) part too), I guess I'll make another patch to fix this one too.
comment:2 by , 12 years ago
Attached patch purgemod3.diff that fixes both of the O(N2) loops, making them much faster O(N)
Time to execute the second loop for 70000 ways and 470000 nodes decreased from 233 seconds to 150 milliseconds with the patch.
So with the patch, Purge action is finally usable even with large datasets (millions of primitives).
Much faster Purge command