Opened 15 years ago
Last modified 4 years ago
#4626 new enhancement
parallelize validator checks
Reported by: | Owned by: | team | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core validator | Version: | latest |
Keywords: | performance, java7 | Cc: | Don-vip, GerdP, skyper |
Description (last modified by )
Hi, I run JOSM on a quad-core system and I want to validate some very large pieces of OSM data. Some checks with validator run for a long time, and only one CPU core is in use while the check runs.
It would be nice if checks could be run in parallel to utilize the other idle CPU cores on my system. (Perhaps even some "fix" operations could be parallelized for performance gains.)
Thanks!
Attachments (2)
Change History (29)
by , 12 years ago
Attachment: | 4626_alpha.patch added |
---|
comment:1 by , 12 years ago
Description: | modified (diff) |
---|
This could also be interesting to see if tests taking a long time to perform could benefit from the new Fork/Join framework introduced in Java 7.
comment:2 by , 12 years ago
Keywords: | java7 added |
---|
comment:3 by , 11 years ago
Milestone: | → 14.05 |
---|
comment:4 by , 11 years ago
Cc: | added |
---|
comment:6 by , 10 years ago
Milestone: | 14.06 → 14.07 |
---|
Move all tickets for which no work has been done yet to next milestone
comment:9 by , 10 years ago
Milestone: | 14.08 |
---|
comment:10 by , 6 years ago
Currently, MapCSS validation takes way too long on large datasets (few hundred MB). Is it possible to run the check as a multithreaded process?
My 4 core, 8 threaded CPU used 15% while running MapCSS validation. But the speedup welcomed on the other checks as well.
comment:19 by , 4 years ago
Cc: | added |
---|
Running all tests (that is, subclasses of org.openstreetmap.josm.data.validation.Test
) in parallel yields only a
@@ ValidationTaskPerformanceTest -[8307, 7321, 7285, 7293, 7280, 7410, 7424, 7298] ms +[7016, 6383, 6421, 6369, 6307, 6121, 6156, 6384] ms
Running MapCSSTagChecker parallel on all primitives yields a nice performance gain, but sporadically triggers some ConcurrentModificationException (the various HashMap instances make a parallel usage dangerous/impossible)
+[3876, 3400, 3000, 3045, 2938, 2919, 3027, 3222] ms
Running MapCSSTagChecker parallel on the MapCSSTagChecker#checks yields a smaller speed-up:
+[2950, 3691, 3129, 4401, 5933, 4363, 3287, 4877] ms
by , 4 years ago
Attachment: | 4626.patch added |
---|
comment:20 by , 4 years ago
Some tests may require lots of memory per primitive, esp. the geometry tests for overlaps or insidness. Please make sure that you test this as well.
follow-up: 25 comment:22 by , 4 years ago
Performance improvemant with 4626.patch really looks good. So far I've not seen a ConcurrentModificationException. Did you find a fix for that already?
A few things that I noticed:
- Cancel button doesn't work
- tests seem to be executed in random order? Might cause trouble because we have some logic that expects the current order
- Progress monitor doesn't show the MapCSSTests at all and "blames" a random - typically fast - java test for quite a while, probably when a complex geometry test takes long
- I see wrong results when I use upload and my data contains new objects which should produce warnings, e.g. with two new water areas inside the same water area I get only one warning
comment:23 by , 4 years ago
Reg. upload:
This seems to help, maybe you have an idea how to avoid the code duplication?
-
src/org/openstreetmap/josm/actions/upload/ValidateUploadHook.java
19 19 import org.openstreetmap.josm.data.validation.Severity; 20 20 import org.openstreetmap.josm.data.validation.Test; 21 21 import org.openstreetmap.josm.data.validation.TestError; 22 import org.openstreetmap.josm.data.validation.tests.MapCSSTagChecker; 22 23 import org.openstreetmap.josm.data.validation.util.AggregatePrimitivesVisitor; 23 24 import org.openstreetmap.josm.gui.ExtendedDialog; 24 25 import org.openstreetmap.josm.gui.MainApplication; … … 63 64 test.setBeforeUpload(true); 64 65 test.setPartialSelection(true); 65 66 test.startTest(null); 66 test.visit(selection); 67 if (test instanceof MapCSSTagChecker) { 68 // VARIANT 1 69 selection.parallelStream() 70 .filter(test::isPrimitiveUsable) 71 .forEach(p -> p.accept(test)); 72 } else { 73 test.visit(selection); 74 } 75 67 76 test.endTest(); 68 77 if (ValidatorPrefHelper.PREF_OTHER.get() && ValidatorPrefHelper.PREF_OTHER_UPLOAD.get()) { 69 78 errors.addAll(test.getErrors());
I see a potential problem regarding the HashSet surrounding in MapCSSTagChecker:
With the default validator rules there is only geometry.mapcss that requires this, but if a user adds his own spatial tests in another *.mapcss file we'll see problems with concurrent updates. Same is probably true for mpAreaCache.
comment:24 by , 4 years ago
Ah, stop. The new code simply skips the test of the surrounding objects with a partial selection, so this requires more work.
comment:25 by , 4 years ago
Replying to GerdP:
So far I've not seen a ConcurrentModificationException. Did you find a fix for that already?
The synchronized (errors)
accommodates for the exceptions I've seen while testing. Unsynchronised access to HashMap
is a ticking time-bomb, as have shown various bugs in the past.
A few things that I noticed:
- Cancel button doesn't work
- Progress monitor doesn't show the MapCSSTests at all and "blames" a random - typically fast - java test for quite a while, probably when a complex geometry test takes long
I've deliberately left out those aspects for the proof-of-concept.
- tests seem to be executed in random order? Might cause trouble because we have some logic that expects the current order
Yeah. If there are inter-dependencies between different tests, parallelising the validator will never work unless we ensure those dependencies via e.g. Future
or CompletableFuture
.
Sigh, in principle everything sounds easy, but JOSM has accumulated various peculiarities over 15 years...
comment:26 by , 4 years ago
Reg. the order I might be wrong. There was a patch that required this but I think it wasn't committed yet.
Some work in progress I had some time ago. Solution may be need to be redesigned to implement UI stuff