Modify

Opened 12 years ago

Last modified 6 months ago

#4626 new enhancement

parallelize validator checks

Reported by: Matt Toups <mtoups@…> Owned by: team
Priority: normal Milestone:
Component: Core validator Version: latest
Keywords: performance, java7 Cc: Don-vip, GerdP, skyper

Description (last modified by Don-vip)

Hi, I run JOSM on a quad-core system and I want to validate some very large pieces of OSM data. Some checks with validator run for a long time, and only one CPU core is in use while the check runs.

It would be nice if checks could be run in parallel to utilize the other idle CPU cores on my system. (Perhaps even some "fix" operations could be parallelized for performance gains.)

Thanks!

Attachments (2)

4626_alpha.patch (6.7 KB) - added by Don-vip 8 years ago.
Some work in progress I had some time ago. Solution may be need to be redesigned to implement UI stuff
4626.patch (7.7 KB) - added by simon04 6 months ago.

Download all attachments as: .zip

Change History (29)

Changed 8 years ago by Don-vip

Attachment: 4626_alpha.patch added

Some work in progress I had some time ago. Solution may be need to be redesigned to implement UI stuff

comment:1 Changed 8 years ago by Don-vip

Description: modified (diff)

This could also be interesting to see if tests taking a long time to perform could benefit from the new Fork/Join framework introduced in Java 7.

comment:2 Changed 8 years ago by Don-vip

Keywords: java7 added

comment:3 Changed 8 years ago by Don-vip

Milestone: 14.05

comment:4 Changed 8 years ago by skyper

Cc: skyper added

comment:5 Changed 7 years ago by Don-vip

Milestone: 14.0514.06

Too complicated/risky for this release.

comment:6 Changed 7 years ago by Don-vip

Milestone: 14.0614.07

Move all tickets for which no work has been done yet to next milestone

comment:7 Changed 7 years ago by Don-vip

Milestone: 14.0714.08

Move some tickets to next milestone

comment:8 Changed 7 years ago by Don-vip

In 7423/josm:

see #9680, see #4626 - multithreaded multipolygon creation

comment:9 Changed 7 years ago by Don-vip

Milestone: 14.08

comment:10 Changed 3 years ago by anonymous

Currently, MapCSS validation takes way too long on large datasets (few hundred MB). Is it possible to run the check as a multithreaded process?
My 4 core, 8 threaded CPU used 15% while running MapCSS validation. But the speedup welcomed on the other checks as well.

comment:11 Changed 6 months ago by simon04

In 17615/josm:

see #4626 - Extract PerformanceTestUtils.getNeubrandenburgDataSet

comment:12 Changed 6 months ago by simon04

In 17616/josm:

see #4626 - Extract class ValidationTask

comment:13 Changed 6 months ago by simon04

In 17617/josm:

see #4626 - Add ValidationTaskPerformanceTest

comment:14 Changed 6 months ago by simon04

In 17618/josm:

see #4626 - PerformanceTestUtils.PerformanceTestTimer: use Stopwatch

comment:15 Changed 6 months ago by simon04

In 17619/josm:

see #4626 - Extract class MapCSSTagCheckerRule

comment:16 Changed 6 months ago by simon04

In 17620/josm:

see #4626 - Extract interface MapCSSTagCheckerFixCommand

comment:18 Changed 6 months ago by simon04

In 17625/josm:

see #4626 - Fix StyleCacheTest (cannot use PerformanceTestUtils)

comment:19 Changed 6 months ago by simon04

Cc: Don-vip GerdP added

Running all tests (that is, subclasses of org.openstreetmap.josm.data.validation.Test) in parallel yields only a

@@ ValidationTaskPerformanceTest
-[8307, 7321, 7285, 7293, 7280, 7410, 7424, 7298] ms
+[7016, 6383, 6421, 6369, 6307, 6121, 6156, 6384] ms

Running MapCSSTagChecker parallel on all primitives yields a nice performance gain, but sporadically triggers some ConcurrentModificationException (the various HashMap instances make a parallel usage dangerous/impossible)

+[3876, 3400, 3000, 3045, 2938, 2919, 3027, 3222] ms

Running MapCSSTagChecker parallel on the MapCSSTagChecker#checks yields a smaller speed-up:

+[2950, 3691, 3129, 4401, 5933, 4363, 3287, 4877] ms

Changed 6 months ago by simon04

Attachment: 4626.patch added

comment:20 Changed 6 months ago by GerdP

Some tests may require lots of memory per primitive, esp. the geometry tests for overlaps or insidness. Please make sure that you test this as well.

comment:22 Changed 6 months ago by GerdP

Performance improvemant with 4626.patch really looks good. So far I've not seen a ConcurrentModificationException. Did you find a fix for that already?
A few things that I noticed:

  • Cancel button doesn't work
  • tests seem to be executed in random order? Might cause trouble because we have some logic that expects the current order
  • Progress monitor doesn't show the MapCSSTests at all and "blames" a random - typically fast - java test for quite a while, probably when a complex geometry test takes long
  • I see wrong results when I use upload and my data contains new objects which should produce warnings, e.g. with two new water areas inside the same water area I get only one warning

comment:23 Changed 6 months ago by GerdP

Reg. upload:
This seems to help, maybe you have an idea how to avoid the code duplication?

  • src/org/openstreetmap/josm/actions/upload/ValidateUploadHook.java

     
    1919import org.openstreetmap.josm.data.validation.Severity;
    2020import org.openstreetmap.josm.data.validation.Test;
    2121import org.openstreetmap.josm.data.validation.TestError;
     22import org.openstreetmap.josm.data.validation.tests.MapCSSTagChecker;
    2223import org.openstreetmap.josm.data.validation.util.AggregatePrimitivesVisitor;
    2324import org.openstreetmap.josm.gui.ExtendedDialog;
    2425import org.openstreetmap.josm.gui.MainApplication;
     
    6364            test.setBeforeUpload(true);
    6465            test.setPartialSelection(true);
    6566            test.startTest(null);
    66             test.visit(selection);
     67            if (test instanceof MapCSSTagChecker) {
     68                // VARIANT 1
     69                selection.parallelStream()
     70                        .filter(test::isPrimitiveUsable)
     71                        .forEach(p -> p.accept(test));
     72            } else {
     73                test.visit(selection);
     74            }
     75
    6776            test.endTest();
    6877            if (ValidatorPrefHelper.PREF_OTHER.get() && ValidatorPrefHelper.PREF_OTHER_UPLOAD.get()) {
    6978                errors.addAll(test.getErrors());

I see a potential problem regarding the HashSet surrounding in MapCSSTagChecker:
With the default validator rules there is only geometry.mapcss that requires this, but if a user adds his own spatial tests in another *.mapcss file we'll see problems with concurrent updates. Same is probably true for mpAreaCache.

comment:24 Changed 6 months ago by GerdP

Ah, stop. The new code simply skips the test of the surrounding objects with a partial selection, so this requires more work.

comment:25 in reply to:  22 Changed 6 months ago by simon04

Replying to GerdP:

So far I've not seen a ConcurrentModificationException. Did you find a fix for that already?

The synchronized (errors) accommodates for the exceptions I've seen while testing. Unsynchronised access to HashMap is a ticking time-bomb, as have shown various bugs in the past.

A few things that I noticed:

  • Cancel button doesn't work
  • Progress monitor doesn't show the MapCSSTests at all and "blames" a random - typically fast - java test for quite a while, probably when a complex geometry test takes long

I've deliberately left out those aspects for the proof-of-concept.

  • tests seem to be executed in random order? Might cause trouble because we have some logic that expects the current order

Yeah. If there are inter-dependencies between different tests, parallelising the validator will never work unless we ensure those dependencies via e.g. Future or CompletableFuture.

Sigh, in principle everything sounds easy, but JOSM has accumulated various peculiarities over 15 years...

comment:26 Changed 6 months ago by GerdP

Reg. the order I might be wrong. There was a patch that required this but I think it wasn't committed yet.

comment:27 Changed 6 months ago by GerdP

Found it, see last patch for #17528

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain team.
as The resolution will be set.
to The owner will be changed from team to the specified user.
The owner will change to Matt Toups <mtoups@alumni.cmu.edu>
as duplicate The resolution will be set to duplicate.The specified ticket will be cross-referenced with this ticket
The owner will be changed from team to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.