Opened 4 months ago
Closed 2 months ago
#24046 closed defect (fixed)
Slow geometry check in canvec area
Reported by: | taylor.smock | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | 25.02 |
Component: | Core | Version: | |
Keywords: | performance | Cc: |
Description
A user (jmarchon) on OSM JOSM (OSM World Discord, I think) has an issue with mapcss geometry checks taking a long time in some areas.
I've attached the test area he provided and I was able to reproduce with.
Attachments (3)
Change History (12)
by , 4 months ago
Attachment: | 034G07-08_canvec_example.osm.gz added |
---|
comment:2 by , 4 months ago
Milestone: | → 25.01 |
---|
I was able to decrease the cost for Geometry#filterInsideMultipolygon
by 98% CPU and 99% memory.
I don't like this solution. I'll have to spend some time figuring out how to do the cache, and (maybe) how to pass it in.
I'm inclined to do something like MultipolygonCache
(or maybe extend it), but I'll think on it.
-
src/org/openstreetmap/josm/tools/Geometry.java
IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 diff --git a/src/org/openstreetmap/josm/tools/Geometry.java b/src/org/openstreetmap/josm/tools/Geometry.java
a b 14 14 import java.util.Collection; 15 15 import java.util.Collections; 16 16 import java.util.Comparator; 17 import java.util.HashMap; 17 18 import java.util.Iterator; 18 19 import java.util.LinkedHashSet; 19 20 import java.util.List; 21 import java.util.Map; 20 22 import java.util.Set; 21 23 import java.util.TreeSet; 22 24 import java.util.function.Predicate; … … 1211 1213 return res; 1212 1214 } 1213 1215 1216 private static Map<Relation, Pair<List<JoinedPolygon>, List<JoinedPolygon>>> jpCache = new HashMap<>(); 1214 1217 /** 1215 1218 * Find all primitives in the given collection which are inside the given multipolygon. Members of the multipolygon are 1216 1219 * ignored. Unclosed ways and multipolygon relations with unclosed outer rings are ignored. … … 1225 1228 return res; 1226 1229 1227 1230 final Pair<List<JoinedPolygon>, List<JoinedPolygon>> outerInner; 1228 try { 1229 outerInner = MultipolygonBuilder.joinWays(multiPolygon); 1230 } catch (MultipolygonBuilder.JoinedPolygonCreationException ex) { 1231 Logging.trace(ex); 1232 Logging.debug("Invalid multipolygon " + multiPolygon); 1233 return res; 1231 if (jpCache.containsKey(multiPolygon)) { 1232 outerInner = jpCache.get(multiPolygon); 1233 } else { 1234 try { 1235 outerInner = MultipolygonBuilder.joinWays(multiPolygon); 1236 } catch (MultipolygonBuilder.JoinedPolygonCreationException ex) { 1237 Logging.trace(ex); 1238 Logging.debug("Invalid multipolygon " + multiPolygon); 1239 return res; 1240 } 1241 jpCache.put(multiPolygon, outerInner); 1234 1242 } 1235 1243 1236 1244 Set<OsmPrimitive> members = multiPolygon.getMemberPrimitives();
follow-up: 5 comment:3 by , 4 months ago
Is this really worth the effort? Typically you will only fill such a cache, but never use the results. Normally you will check different elements with the next upload.
comment:4 by , 4 months ago
I often end up running the validator 2-3 times on the same data when I'm working on importing Canvec.
comment:5 by , 4 months ago
Replying to stoecker:
Is this really worth the effort? Typically you will only fill such a cache, but never use the results. Normally you will check different elements with the next upload.
What I put in comment:2 was more of a POC. In that case, in the event that someone does the following:
- Download data
- Edit
- Validate/Upload
- Delete current layer
- Download more data
your issue with only filling the cache but never reusing the results would be valid (if we were able to avoid rebuilding the multipolygon area multiple times).
In all other cases, it is not. And for the above case, extending MultipolygonCache
to have another per-dataset cache would fix that particular issue.
I'll try to remember to check to see which geometry check(s) are causing the slowdown and why, but my suspicion is that it is a contains check -- e.g. check if water area contained in water multipolygon
. In that case, we do the following:
- Get any water multipolygons that intersect the bounding box for the water area we are checking
- Pass in a singleton list of the water area and the multipolygon to the method
- Repeat for each water area
The problem is in step 2, where we rebuild an Area
for the multipolygon outer/inner areas for each way that we are checking.
If we are able to change the check such that we can pass multiple water areas to the method with a single multipolygon, then that would also fix the problem. But that would (IMO) be much more complicated, since we would have to create a map in the check and then go back to it when we "end" the test to perform the actual checks. Something like Map<Relation, List<OsmPrimitive>>
would work for the map.
Alternatively, we can add something similar to the mpAreaCache
(?) on the environment that would cache the multipolygon calculations used by filterInsideMultipolygon
. Note that we cannot reuse mpAreaCache
(?) since it caches an area calculation that isn't (directly) used by filterInsideMultipolygon
. I did try rewriting filterInsideMultipolygon
to use mpAreaCache
(?), and it didn't work as well as I had hoped.
FTR, I also tried making the area calculations lazy, and while that helped, it was at most 10% improvement with the problem data, so I thought it wasn't worth the potential issues (the area
used is calculated at object initialization, and is a public variable used directly).
@jmarchon's test area is an egregious example, and most people won't run into it. But when it slows down the validator enough that users are wondering what they would lose if they disable the geometry validations, we have a problem.
Note: mpAreaCache
is what I remember the cache map being called, but I could be wrong. That is why I put (?)
after it. I'm not on my standard dev machine right now, and I probably won't be until after the new year, but (IIRC), mpAreaCache
was a Map<IPrimitive, Area>
object. This doesn't work for multipolygons since it would need to be Map<IPrimitive, List<Area>>
, and (more importantly) the JoinedWay
objects used by filterInsideMultipolygon
do the calculation for a sorted collection of way nodes.
by , 3 months ago
Attachment: | 24046.patch added |
---|
Extend MultipolygonCache
to cache JoinedPolygon
calculations
comment:6 by , 3 months ago
I like attachment:24046.patch due to simplicity, and every caller gets the optimizations automatically.
I like attachment:24046.1.patch since it doesn't keep unnecessary data in memory after a validation run. I'm leaning towards the second patch, just to avoid adding to the static memory size.
comment:8 by , 3 months ago
I've created an account now. Just commenting here so I get an email if there are any updates.
Test area