Modify

Opened 4 months ago

Closed 2 months ago

#24046 closed defect (fixed)

Slow geometry check in canvec area

Reported by: taylor.smock Owned by: team
Priority: normal Milestone: 25.02
Component: Core Version:
Keywords: performance Cc:

Description

A user (jmarchon) on OSM JOSM (OSM World Discord, I think) has an issue with mapcss geometry checks taking a long time in some areas.

I've attached the test area he provided and I was able to reproduce with.

Attachments (3)

034G07-08_canvec_example.osm.gz (5.6 MB ) - added by taylor.smock 4 months ago.
Test area
24046.patch (5.7 KB ) - added by taylor.smock 3 months ago.
Extend MultipolygonCache to cache JoinedPolygon calculations
24046.1.patch (14.2 KB ) - added by taylor.smock 3 months ago.
Cache stored in test

Change History (12)

by taylor.smock, 4 months ago

Test area

comment:1 by taylor.smock, 4 months ago

In 19272/josm:

See #24046: Reduce cost of Geometry#filterInsidePolygon when a primitive is a relation

This was done by checking that the bounds of the polygon contain the bounds of
the multipolygon ring before creating an area from the multipolygon ring.

If the polygon does not contain the bounds of the ring, then at least part
of the ring is outside the polygon, and thus is not inside the polygon.

Measurements using validators at (57.5183581,-75.0512982),(57.2181217,-73.9434821):

  • CPU: -87%
  • Memory: -62%

Please note that the test area is still very expensive in
Geometry#filterInsideMultipolygon (note the Multipolygon).

comment:2 by taylor.smock, 4 months ago

Milestone: 25.01

I was able to decrease the cost for Geometry#filterInsideMultipolygon by 98% CPU and 99% memory.

I don't like this solution. I'll have to spend some time figuring out how to do the cache, and (maybe) how to pass it in.

I'm inclined to do something like MultipolygonCache (or maybe extend it), but I'll think on it.

  • src/org/openstreetmap/josm/tools/Geometry.java

    IDEA additional info:
    Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
    <+>UTF-8
    diff --git a/src/org/openstreetmap/josm/tools/Geometry.java b/src/org/openstreetmap/josm/tools/Geometry.java
    a b  
    1414import java.util.Collection;
    1515import java.util.Collections;
    1616import java.util.Comparator;
     17import java.util.HashMap;
    1718import java.util.Iterator;
    1819import java.util.LinkedHashSet;
    1920import java.util.List;
     21import java.util.Map;
    2022import java.util.Set;
    2123import java.util.TreeSet;
    2224import java.util.function.Predicate;
     
    12111213        return res;
    12121214    }
    12131215
     1216    private static Map<Relation, Pair<List<JoinedPolygon>, List<JoinedPolygon>>> jpCache = new HashMap<>();
    12141217    /**
    12151218     * Find all primitives in the given collection which are inside the given multipolygon. Members of the multipolygon are
    12161219     * ignored. Unclosed ways and multipolygon relations with unclosed outer rings are ignored.
     
    12251228            return res;
    12261229
    12271230        final Pair<List<JoinedPolygon>, List<JoinedPolygon>> outerInner;
    1228         try {
    1229             outerInner = MultipolygonBuilder.joinWays(multiPolygon);
    1230         } catch (MultipolygonBuilder.JoinedPolygonCreationException ex) {
    1231             Logging.trace(ex);
    1232             Logging.debug("Invalid multipolygon " + multiPolygon);
    1233             return res;
     1231        if (jpCache.containsKey(multiPolygon)) {
     1232            outerInner = jpCache.get(multiPolygon);
     1233        } else {
     1234            try {
     1235                outerInner = MultipolygonBuilder.joinWays(multiPolygon);
     1236            } catch (MultipolygonBuilder.JoinedPolygonCreationException ex) {
     1237                Logging.trace(ex);
     1238                Logging.debug("Invalid multipolygon " + multiPolygon);
     1239                return res;
     1240            }
     1241            jpCache.put(multiPolygon, outerInner);
    12341242        }
    12351243
    12361244        Set<OsmPrimitive> members = multiPolygon.getMemberPrimitives();

comment:3 by stoecker, 4 months ago

Is this really worth the effort? Typically you will only fill such a cache, but never use the results. Normally you will check different elements with the next upload.

comment:4 by jmarchon, 4 months ago

I often end up running the validator 2-3 times on the same data when I'm working on importing Canvec.

in reply to:  3 comment:5 by taylor.smock, 4 months ago

Replying to stoecker:

Is this really worth the effort? Typically you will only fill such a cache, but never use the results. Normally you will check different elements with the next upload.

What I put in comment:2 was more of a POC. In that case, in the event that someone does the following:

  1. Download data
  2. Edit
  3. Validate/Upload
  4. Delete current layer
  5. Download more data

your issue with only filling the cache but never reusing the results would be valid (if we were able to avoid rebuilding the multipolygon area multiple times).

In all other cases, it is not. And for the above case, extending MultipolygonCache to have another per-dataset cache would fix that particular issue.

I'll try to remember to check to see which geometry check(s) are causing the slowdown and why, but my suspicion is that it is a contains check -- e.g. check if water area contained in water multipolygon. In that case, we do the following:

  1. Get any water multipolygons that intersect the bounding box for the water area we are checking
  2. Pass in a singleton list of the water area and the multipolygon to the method
  3. Repeat for each water area

The problem is in step 2, where we rebuild an Area for the multipolygon outer/inner areas for each way that we are checking.

If we are able to change the check such that we can pass multiple water areas to the method with a single multipolygon, then that would also fix the problem. But that would (IMO) be much more complicated, since we would have to create a map in the check and then go back to it when we "end" the test to perform the actual checks. Something like Map<Relation, List<OsmPrimitive>> would work for the map.

Alternatively, we can add something similar to the mpAreaCache(?) on the environment that would cache the multipolygon calculations used by filterInsideMultipolygon. Note that we cannot reuse mpAreaCache(?) since it caches an area calculation that isn't (directly) used by filterInsideMultipolygon. I did try rewriting filterInsideMultipolygon to use mpAreaCache(?), and it didn't work as well as I had hoped.

FTR, I also tried making the area calculations lazy, and while that helped, it was at most 10% improvement with the problem data, so I thought it wasn't worth the potential issues (the area used is calculated at object initialization, and is a public variable used directly).

@jmarchon's test area is an egregious example, and most people won't run into it. But when it slows down the validator enough that users are wondering what they would lose if they disable the geometry validations, we have a problem.

Note: mpAreaCache is what I remember the cache map being called, but I could be wrong. That is why I put (?) after it. I'm not on my standard dev machine right now, and I probably won't be until after the new year, but (IIRC), mpAreaCache was a Map<IPrimitive, Area> object. This doesn't work for multipolygons since it would need to be Map<IPrimitive, List<Area>>, and (more importantly) the JoinedWay objects used by filterInsideMultipolygon do the calculation for a sorted collection of way nodes.

by taylor.smock, 3 months ago

Attachment: 24046.patch added

Extend MultipolygonCache to cache JoinedPolygon calculations

by taylor.smock, 3 months ago

Attachment: 24046.1.patch added

Cache stored in test

comment:6 by taylor.smock, 3 months ago

I like attachment:24046.patch due to simplicity, and every caller gets the optimizations automatically.
I like attachment:24046.1.patch since it doesn't keep unnecessary data in memory after a validation run. I'm leaning towards the second patch, just to avoid adding to the static memory size.

comment:7 by stoecker, 3 months ago

Milestone: 25.0125.02

Ticket retargeted after milestone closed

comment:8 by jmarchon, 3 months ago

I've created an account now. Just commenting here so I get an email if there are any updates.

comment:9 by stoecker, 2 months ago

Resolution: fixed
Status: newclosed

In 19336/josm:

fix #24046 - improve speed of multipolygon validator - patch by taylor.smock

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.