Modify

Opened 6 years ago

Last modified 4 years ago

#17388 new enhancement

[WIP PATCH] gpx_distance() is slow when there are many gpx points

Reported by: taylor.smock Owned by: team
Priority: normal Milestone:
Component: Core Version:
Keywords: mapcss Cc:

Description

I added a bunch of gpx tracks to josm, and then ran a validator.

The total distance covered was in excess of 400 km (so probably about 13,000 gpx points and around 180,000 primitives for around 2,340,000,000 operations per gpx_distance() call) and it was a conglomeration of several different tracks.

I'm thinking that a possible solution would be to make bounding boxes for the gpx tracks and iterate through them to find the closest bounding box (probably find the bounding boxes that overlap with the closest bounding box) and iterate through the bounding boxes.

The reason it is slow is due to an n2 algorithm (for each osm primitive, we have to loop through the gpx tracks at least once).

Workaround: Reduce the number of osmprimitives/gpx points that are loaded at any one time.

I probably should have tested something like this, but I didn't think to do so.

Attachments (8)

GpxDistance_speed.patch (4.6 KB ) - added by taylor.smock 6 years ago.
Untested patch that adds a method to get the distance between the centroid of an OsmPrimitive and a Bounds, it also starts work on finding the closest Bounds.
GpxDistance_speed_v2.patch (5.8 KB ) - added by taylor.smock 6 years ago.
Working patch (VERY SLOW), but it does cut down on the iterations significantly in many cases (I don't know where the slowdown is).
17388.projection_cache.patch (1.0 KB ) - added by taylor.smock 4 years ago.
17388.geometry_sqrt_cache.patch (11.9 KB ) - added by taylor.smock 4 years ago.
Avoid new node, cache east north (only affects future runs), modify Geometry to avoid unnecessary Math.sqrt
17388.geometry_sqrt_cache_valid.patch (17.8 KB ) - added by taylor.smock 4 years ago.
Precalculate isValid for EastNorth, add new function that takes maxDistance for getLowestDistance in GpxDistance, modify BBox to have a bounds method that takes a ILatLon and deprecate the method that uses LatLon, max gpx_distance as deprecated while adding one that takes a distance, avoid unnecessary Math.sqrt in Geometry
17388.4.patch (18.6 KB ) - added by taylor.smock 4 years ago.
Avoid calculating validity until isValid is called, at which point caching may occur.
17388.5.patch (17.8 KB ) - added by taylor.smock 4 years ago.
Same as attachment:17388.4.patch, except with two booleans
17388.6.patch (17.7 KB ) - added by taylor.smock 4 years ago.
Same as attachment:17388.4.patch, except with one Boolean

Download all attachments as: .zip

Change History (30)

by taylor.smock, 6 years ago

Attachment: GpxDistance_speed.patch added

Untested patch that adds a method to get the distance between the centroid of an OsmPrimitive and a Bounds, it also starts work on finding the closest Bounds.

by taylor.smock, 6 years ago

Attachment: GpxDistance_speed_v2.patch added

Working patch (VERY SLOW), but it does cut down on the iterations significantly in many cases (I don't know where the slowdown is).

comment:1 by GerdP, 4 years ago

Is this still an issue?

I added a bunch of gpx tracks to josm, and then ran a validator.

I don't know any test that would care about gpx layers. Please give more details.

I don't know where the slowdown is

You can use VisualVM to find out.

comment:2 by taylor.smock, 4 years ago

I suspect it is still a problem, but it only ever shows up when a test looks for distance to nearest gpx point. I originally wrote this to help me quickly go through street level imagery to add missing information.

For example, if I'm adding surfaces or lanes to roads, I might have a test that looks something like this:

  way[highway=~/^(motorway|trunk|primary|secondary|tertiary|residential|unclassified)$/][!surface][gpx_distance() < 30 || gpx_distance() >= 50000000],
  way[highway=~/^.*_link$/][!surface][gpx_distance() < 30 || gpx_distance() >= 50000000] {
    throwWarning: tr("Missing surface on a road");
  }
  way[highway=~/^(service|living_street)$/][!surface][gpx_distance() < 30 || gpx_distance() >= 50000000] {
    throwOther: tr("Missing surface on a minor road");
  } 

(Yes, that is a test that I have used in the past).

I'll double check this, but the workaround was easy enough that I never was able to prioritize this bug. I'll see if I can work this in somewhere, just so this is actually fixed. I think this may have also have been ameliorated by reducing the number of gpx points in the generated track (we have a branch of the tasking manager which lets us create tasks based off of Mapillary sequences).

comment:3 by GerdP, 4 years ago

Ah, yes, looks like a lot of iterations. I have no idea why gpx_distance() is used here but that's a different problem.
VisualVM claims that most of the time is spent in org.openstreetmap.josm.data.osm.INode.getEastNorth(). I wonder why this method is called instead of org.openstreetmap.josm.data.osm.Node.getEastNorth(Projecting projection) which uses the cached value.

comment:4 by GerdP, 4 years ago

Yes, in fact, it seems to be the calculation of the eastNorth values for the node created from the gpx data. This calculation uses Math.atan() wich is very slow.

in reply to:  3 comment:5 by taylor.smock, 4 years ago

Replying to GerdP:

Ah, yes, looks like a lot of iterations. I have no idea why gpx_distance() is used here but that's a different problem.
VisualVM claims that most of the time is spent in org.openstreetmap.josm.data.osm.INode.getEastNorth(). I wonder why this method is called instead of org.openstreetmap.josm.data.osm.Node.getEastNorth(Projecting projection) which uses the cached value.

It looks like INode.getEastNorth uses getEastNorth(ProjectionRegistry.getProjection()). So it should be cached.

Yes, in fact, it seems to be the calculation of the eastNorth values for the node created from the gpx data. This calculation uses Math.atan() wich is very slow.

That makes sense. :(

by taylor.smock, 4 years ago

comment:6 by GerdP, 4 years ago

I tried something similar and it was even slower. It would be good if you could avoid the creation of the node instance.

in reply to:  6 comment:7 by taylor.smock, 4 years ago

Replying to GerdP:

I tried something similar and it was even slower. It would be good if you could avoid the creation of the node instance.

Fair enough. I'll still double check that, but it would be good to avoid the create of a node instance anyway. That was just (literally) first thing that came to mind when you indicated that the Node.getEastNorth was getting hit a lot.

I'll make some modifications in Geometry next so that an eastnorth or latlon can be sent in. I'll have to refresh my memory of the code there first.

I'll also get a "base" runtime with the current state, the state with the patch I just added, and then whatever comes next. I think I'll upload both the gps file and the osm area to make things a bit easier to reproduce.

comment:8 by taylor.smock, 4 years ago

PatchFirst RunSecond run
No patch 2h 57m 27s N/A
17388.projection_cache.patch
17388.geometry_sqrt_cache.patch59m 39s
17388.geometry_sqrt_cache_valid.patch 29m 25s

I'll be updating this table as things finish running.

Notes:

  • GPX layer has ~826000 points.
  • First run of 17388.geometry_sqrt_cache_valid.patch had a debugger attached and was paused for some time
Last edited 4 years ago by taylor.smock (previous) (diff)

by taylor.smock, 4 years ago

Avoid new node, cache east north (only affects future runs), modify Geometry to avoid unnecessary Math.sqrt

comment:9 by taylor.smock, 4 years ago

@GerdP:

The documentation for EastNorth indicates that it is immutable ("This class is immutable"). Do we need to calculate the isValid every time isValid is called, or can we cache the result? Or better yet, calculate on creation, and store the boolean value at that time.

About the only reason I can think of to not cache the result or precalculate it would be memory usage. boolean isn't well defined for size, and bytes are 8 bits, so worst case scenario, each boolean would increase memory usage by a byte.

The reason I'm asking is that most of the time is now spent in Double.isNaN and Double.isInfinite (both are used in EastNorth.isValid).

TBH, Don-vip might be a better person to ask for that (he's the only one who has touched that function, as far as I can tell).

comment:10 by GerdP, 4 years ago

I don't think that memory is a problem if you talk about class EastNorth. There should not be many instances.
Node has its own fields for the values.

by taylor.smock, 4 years ago

Precalculate isValid for EastNorth, add new function that takes maxDistance for getLowestDistance in GpxDistance, modify BBox to have a bounds method that takes a ILatLon and deprecate the method that uses LatLon, max gpx_distance as deprecated while adding one that takes a distance, avoid unnecessary Math.sqrt in Geometry

comment:11 by GerdP, 4 years ago

Your patch will decrease performance in other situations where isValid() is not used. I would simply cache the result.

I am still trying to understand what the value returned by gpx_distance() means in a mapcss rule. I have a bunch of gpx tracks from my cycle tours. I load them into JOSM and download an area near my home, so that many ways are "covered" by gpx data. Is that the scenario we are talking about? I don't see how the method detects if I cycled along an OSM way or if I just crossed it a few times.

comment:12 by GerdP, 4 years ago

I would simply cache the result.

Hmm, probably not that easy with an immutable class. Would a new class EastNorthValidated solve this?

comment:13 by GerdP, 4 years ago

I wonder if getEastNorth() could/should be changed to return null instead of a new EastNorth instance with invalid data. A null check is much easier to handle.
Edit: Wouldn't help. We have lots of methods which calculdate east/north values to create new EastNorth instances.

Last edited 4 years ago by GerdP (previous) (diff)

by taylor.smock, 4 years ago

Attachment: 17388.4.patch added

Avoid calculating validity until isValid is called, at which point caching may occur.

comment:14 by GerdP, 4 years ago

Why don't you use a Boolean?

by taylor.smock, 4 years ago

Attachment: 17388.5.patch added

Same as attachment:17388.4.patch, except with two booleans

by taylor.smock, 4 years ago

Attachment: 17388.6.patch added

Same as attachment:17388.4.patch, except with one Boolean

in reply to:  14 comment:15 by taylor.smock, 4 years ago

Replying to GerdP:

Why don't you use a Boolean?

Short answer: I am pretty certain that the number of EastNorth objects can easily reach one million, so I'm attaching a few patches with different methods prior to running them under VisualVM.

Depending upon how the VM does things, I would expect 17388.6.patch to be the most efficient (they all refer to the same Boolean.TRUE or Boolean.FALSE instance, but then there is the size of the pointer to the Boolean instances). If that is not the most memory efficient, I would then expect 17388.5.patch to be the most efficient, unless the size of boolean is large, in which case, 17388.4.patch would become the "most" efficient, which would be surprising. But we have a Java Bugs page for a reason.

comment:16 by GerdP, 4 years ago

See e.g. http://java-performance.info/overview-of-memory-saving-techniques-java/
It probably doesn't matter which version you use unless you add more fields to EastNorth.

I am pretty certain that the number of EastNorth objects can easily reach one million

That would be bad. My understanding is that class EastNorth is used to pass arguments between methods. Which classes allocate EastNorth instances?

in reply to:  16 comment:17 by taylor.smock, 4 years ago

Replying to GerdP:

See e.g. http://java-performance.info/overview-of-memory-saving-techniques-java/
It probably doesn't matter which version you use unless you add more fields to EastNorth.

I am pretty certain that the number of EastNorth objects can easily reach one million

That would be bad. My understanding is that class EastNorth is used to pass arguments between methods. Which classes allocate EastNorth instances?

For whatever reason, I assumed that WayPoint/Node cached the created EastNorth value. Since they both create a new EastNorth value, I'm now wondering how the runtime was reduced.

Probable locations of runtime reduction:

  • Avoiding Math.sqrt where possible
  • Using the method that gives a maximum distance to search

As far as your link goes:

Prefer primitive types to their Object wrappers. The main cause of wrapper types usage are JDK collections, so consider using one of primitive type collection frameworks like Trove.

and

Numeric wrappers occupy 12 bytes plus size of the underlying type. Byte/Short/Character/Integer/Long are cached by JDK, so the actual memory consumption may be smaller for values in the range [-128; 127]. Anyway, these type may be the source of serious memory overhead in the collection-based applications:

are the truly relevant sections. Boolean is probably cached, but may or may not be.

In any case, without actual caching of the EastNorth values, it becomes largely pointless to try to cache isValid. Maybe weakreferences?

comment:18 by GerdP, 4 years ago

Sorry, I didn't try the last patches because I think that the calculation itself is rather useless.
I don't understand what the distance between an OSM way and the points of a collection of gpx tracks means. Instead of optimizing the details it might be better to avoid the computation when the result is not usable.

in reply to:  18 comment:19 by taylor.smock, 4 years ago

Replying to GerdP:

Sorry, I didn't try the last patches because I think that the calculation itself is rather useless.
I don't understand what the distance between an OSM way and the points of a collection of gpx tracks means. Instead of optimizing the details it might be better to avoid the computation when the result is not usable.

The actual code uses both gpx layers and geoimage layers. Its just that everything refers to Gpx. There is a faux GpxLayer created for GeoImage so that it can also be used with this function.

So a better example would be driving/biking around taking time-lapse pictures, and then using the example validator.mapcss to add surfaces/lanes to OSM.

EDIT: So, instead of using potentially outdated overhead, you can quickly check roads that may be visible in the photos.

Last edited 4 years ago by taylor.smock (previous) (diff)

comment:20 by GerdP, 4 years ago

OK, now I understand what this is about. Did not make the connection to pictures so far. So, each gpx point represents a picture which possible shows an area of x meters around the point and you try to calculate OSM ways which are within this radius. If many gpx points exist it might be faster to calculate once the area covered by the circles around the points and find ways which are intersecting this area (or the opposite). This could be done by calculating a small halo along the way like the renderer does and use something like java.awt.geom.Area.intersect(Area rhs). But these calculations are also very slow when the areas are complex, so you may have to use some simplifications to reduce the complexity of the polygons.

Anyway, I'll try 17388.6.patch with and without the changes in EastNorth now.

comment:21 by GerdP, 4 years ago

My results:

  • Without the changes in EastNorth the performance is better.
  • The patch improves significantly the performance compared to unpatched version when using the rules from comment:2 (1.7 secs instead of 4.9 secs with my small test set)

I don't know how to use the new mapcss function. Please let me know the modified rules.

comment:22 by GerdP, 4 years ago

reg. the "enlarged bbox": see also code in UnconnectedWays.MyWaySegment (dist is given in meters):

  BBox bounds = this.getBounds(dist * (360.0d / (Ellipsoid.WGS84.a * 2 * Math.PI)));

Something like this should work for a complete way:

  double extraSpace = dist * (360.0d / (Ellipsoid.WGS84.a * 2 * Math.PI));
  BBox bbox = new BBox();
  bbox.addPrimitive(way, extraSpace);
Last edited 4 years ago by GerdP (previous) (diff)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain team.
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from team to the specified user.
Next status will be 'needinfo'. The owner will be changed from team to taylor.smock.
as duplicate The resolution will be set to duplicate. Next status will be 'closed'. The specified ticket will be cross-referenced with this ticket.
The owner will be changed from team to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.