Modify

Opened 16 months ago

Last modified 6 months ago

#22614 new defect

[WIP patch] Combine ways is very slow

Reported by: GerdP Owned by: team
Priority: minor Milestone:
Component: Core Version:
Keywords: template_report performance Cc: gaben, stoecker

Description

What steps will reproduce the problem?

  1. load attached file
  2. select all ways
  3. press C to combine them

What is the expected result?

A popup that says that the ways cannot be combined because several nodes have too many connections

What happens instead?

JOSM keeps one CPU core busy in method org.openstreetmap.josm.data.osm.NodeGraph.buildSpanningPath()

Please provide any additional information below. Attach a screenshot if possible.

Relative:URL: ^/trunk
Repository:UUID: 0c6e7542-c601-0410-84e7-c038aed88b3b
Last:Changed Date: 2022-12-08 17:09:09 +0100 (Thu, 08 Dec 2022)
Revision:18612
Build-Date:2022-12-11 02:30:55
URL:https://josm.openstreetmap.de/svn/trunk

Identification: JOSM/1.5 (18612 en) Windows 10 64-Bit
OS Build number: Windows 10 Home 2009 (19044)
Memory Usage: 335 MB / 1972 MB (206 MB allocated, but free)
Java version: 17.0.4+8-LTS, Azul Systems, Inc., OpenJDK 64-Bit Server VM
Look and Feel: com.sun.java.swing.plaf.windows.WindowsLookAndFeel
Screen: \Display0 1920×1080 (scaling 1.00×1.00)
Maximum Screen Size: 1920×1080
Best cursor sizes: 16×16→32×32, 32×32→32×32
System property file.encoding: Cp1252
System property sun.jnu.encoding: Cp1252
Locale info: en_DE
Numbers with default locale: 1234567890 -> 1234567890
VM arguments: [-Djpackage.app-version=1.5.18531, --add-modules=java.scripting,java.sql,javafx.controls,javafx.media,javafx.swing,javafx.web, --add-exports=java.base/sun.security.action=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.plugins.jpeg=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.spi=ALL-UNNAMED, --add-opens=java.base/java.lang=ALL-UNNAMED, --add-opens=java.base/java.nio=ALL-UNNAMED, --add-opens=java.base/jdk.internal.loader=ALL-UNNAMED, --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED, --add-opens=java.desktop/javax.imageio.spi=ALL-UNNAMED, --add-opens=java.desktop/javax.swing.text.html=ALL-UNNAMED, --add-opens=java.prefs/java.util.prefs=ALL-UNNAMED, -Djpackage.app-path=%UserProfile%\AppData\Local\JOSM\HWConsole.exe]
Dataset consistency test: No problems found

Plugins:
+ OpeningHoursEditor (35924)
+ apache-commons (36034)
+ buildings_tools (36011)
+ contourmerge (v0.1.9)
+ ejml (35924)
+ geotools (36028)
+ jackson (36034)
+ jaxb (35952)
+ jts (36004)
+ o5m (35893)
+ opendata (36025)
+ pbf (36034)
+ poly (35976)
+ reltoolbox (35976)
+ reverter (36043)
+ undelete (36011)
+ utilsplugin2 (36011)

Validator rules:
+ c:\josm\core\resources\data\validator\combinations.mapcss
+ c:\josm\core\resources\data\validator\geometry.mapcss
+ c:\josm\core\resources\data\validator\unnecessary.mapcss
+ d:\java_tools\JOSM\mygeometry.mapcss
+ https://josm.openstreetmap.de/josmfile?page=Rules/GermanySpecific&zip=1

Last errors/warnings:
- 00000.572 W: extended font config - overriding 'filename.Myanmar_Text=mmrtext.ttf' with 'MMRTEXT.TTF'
- 00000.574 W: extended font config - overriding 'filename.Mongolian_Baiti=monbaiti.ttf' with 'MONBAITI.TTF'
- 00000.972 E: java.security.KeyStoreException: Windows-ROOT not found. Cause: java.security.NoSuchAlgorithmException: Windows-ROOT KeyStore not available

Attachments (8)

slow_combine.osm.xz (167.9 KB ) - added by GerdP 16 months ago.
22614-smaller1.osm (40.6 KB ) - added by GerdP 16 months ago.
much smaller dataset which still seems to loop forever
22614-smaller2-finishes.osm (31.7 KB ) - added by GerdP 16 months ago.
this even smaller one finishes with the expected message, but also requires a few seconds on my PC
22614.patch (843 bytes ) - added by GerdP 16 months ago.
Could it be that simple? Passes all unit tests and improves speed drastically
combineways_good.osm (1.8 KB ) - added by GerdP 16 months ago.
simple example which requires the loop
22614_faster1.patch (2.9 KB ) - added by GerdP 15 months ago.
22614_check_eulerian.patch (9.3 KB ) - added by GerdP 15 months ago.
22614-alt-algo.patch (16.6 KB ) - added by GerdP 11 months ago.
WIP patch for alternative algorithm to find spanning path

Download all attachments as: .zip

Change History (31)

by GerdP, 16 months ago

Attachment: slow_combine.osm.xz added

comment:1 by GerdP, 16 months ago

Keywords: performance added
Priority: normalminor

There seems to be something special with this data, I tried with some smaller sub sets of the ways and saw the expected immediate response.
No real problem for me, I was just experimenting with the data to produce a GPX file with all the ways I recently cycled...

comment:2 by taylor.smock, 16 months ago

I took a look, and the CPU cycles appears to be dominated in HashMap get operations and HashSet contains/remove/add methods. There is also a bit from creating/adding to an ArrayDeque. Most of the contains/remove code is in hashCode, which is also the major creator of new objects.

Some optimizations I can see:

  • Optimize NodePair hashCode -- it looks like this is hit frequently on that code path. This currently takes ~15% of the CPU cycles. Maybe a.hashCode() * 31 + b.hashCode()? This will also save an array allocation.
  • Marking NodePair as final with an intent to change it to a value/record class might help from a memory perspective -- most of the memory (over an hour profiling, I was kind of ignoring it) was in new NodePair from NodePair#swap, followed by HashSet#add, getOutboundPairs, and Deque#addAll. Note: I had changed the implementation of hashCode to be 31 * a.hashCode() + b.hashCode(), which might be why I didn't see hashCode as a major issue in the hour-long run.

But what I think is going on here is a n2 (or worse) problem. Either that, or we are magically going in circles.

In any case, it should bail at some point (see nodes at 52.9903756, 8.2774942).

comment:3 by GerdP, 16 months ago

I don't think that it will help much to optimize the code. I am pretty sure the loop doesn't end with this input.
I didn't invest much time but it seems that the algo has no special treatment for nodes with exactly two connected ways and those for nodes with more. I am not sure if the code is meant to combine segments so that they form p-shaped ways? Else any node with 3+ ways would be a clear signal to stop.

comment:4 by taylor.smock, 16 months ago

It is true that it won't help the endless loop problem to optimize the code. I was mostly looking at the optimization paths as something to look at later, once we fix the endless loop problem.

Anyway, it appears to have something to do with the number of nodes. Or maybe a specific node.

Simplify WayNodes removedTime spent
100436211ms
10358914ms
9.5356513ms
9353514ms
8.9353117ms
8.8352116ms
8.7351315ms
8.6350414ms
8.53496failed

comment:5 by GerdP, 16 months ago

I'll see if I can provide a much smaller sample that shows the loop. Maybe the algorithm that we use requires something that is not assured.

by GerdP, 16 months ago

Attachment: 22614-smaller1.osm added

much smaller dataset which still seems to loop forever

by GerdP, 16 months ago

Attachment: 22614-smaller2-finishes.osm added

this even smaller one finishes with the expected message, but also requires a few seconds on my PC

by GerdP, 16 months ago

Attachment: 22614.patch added

Could it be that simple? Passes all unit tests and improves speed drastically

comment:6 by taylor.smock, 16 months ago

That really makes me wonder why we have dupCheck.remove in the first place. (see r15574 for removeLast -- regression test was added, see r15554 for initial version -- no test added). You'd be the best person to know. :)

We should probably test the patch against attachment:Brandenburg.osm:ticket:18368.

by GerdP, 16 months ago

Attachment: combineways_good.osm added

simple example which requires the loop

comment:7 by GerdP, 16 months ago

My patch works fine with the brandenburg.osm data, but it doesn't work with the combineways_good.osm example. So, no, it's not that simple.
There is a very simply check that we don't perform yet: If there are more than two terminal nodes (my complex example contains 4) we can stop, but of course this doesn't solve the general performance problem. I'll see if I can find a better algo to build the spanning graph.

comment:8 by gaben, 16 months ago

Cc: gaben added

comment:9 by GerdP, 16 months ago

If I got that right the problem is NP-hard, so no matter how fast we implement the algo we can always hit the situation that the action will not finish in a reasonable time. So, I think we should use the PleaseWaitProgressMonitor so that the user can cancel the action.
Maybe always, maybe only when there are at least n nodes with more than two attached segments.

comment:10 by taylor.smock, 16 months ago

The fun thing is that if we simplify the ways in slow_combine.osm.xz, it works. So there has got to be something else we are missing.

I'd almost modify NodeGraph such that it builds the NodePairs with only nodes that are first/last nodes, or have more than 1 parent way.

But adding a ProgressMonitor is probably a better option.

comment:11 by GerdP, 16 months ago

Sure, the current implementation wastes lots of time with nodes where only one connection is possible, so execution time and memory consumption time depends on the number of nodes in the ways instead of the complexity of the simplified graph. OTOH it is unlikely that anybody really needs the result, I jsut stumbled over this
I just tried to implement a solution with ProgressMonitorbut I don't seem to remember how to submit a task AND wait until it is either canceled of finished.

comment:12 by taylor.smock, 16 months ago

The problem is that a user could accidentally freeze JOSM, potentially losing some work.

I just tried to implement a solution with ProgressMonitor but I don't seem to remember how to submit a task AND wait until it is either canceled of finished.

It looks like you want to use PleaseWaitRunnable.

comment:13 by GerdP, 16 months ago

Yes, but I failed to find out how to submit the task and wait for the result / cancelation.

by GerdP, 15 months ago

Attachment: 22614_faster1.patch added

comment:14 by GerdP, 15 months ago

With this rather simply patch the performance problems mentioned in this ticket and also #22651 are solved or at least very much improved.
The change in NodePair.toString() just helps while debugging data with new nodes, it is not required.

The unit test testTicket18367NeedsSplit doesn't pass, I think it checks the wrong thing. The data requires a split but it doesn't require a reversed edge, so the test should be changed.
I no longer like the code in joinWithMultipolygonCode(), this was only added to solve performance problems but it also allows to combine ways with overlapping segments and - according to #18368 - this should be allowed and it is tested with another unit test.
I'll try to find a better solution in NodeGraph to handle overlapping segments.

comment:15 by GerdP, 15 months ago

Cc: stoecker added

@all:
I don't see a way to handle overlapping way segments with the current implementation in NodePair and NodeGraph which are also used in some plugins (opendata, merge-overlap). These classes are designed to handle overlapping segments as equal, therefore it is not possible to build a path containing overlapping segments . Only the code in CombineWayAction.joinWithMultipolygonCode(Collection<Way> ways) allows this and I didn't really intend this, it was just a side effect.
Question is if I can simply remove that code so that CombineWayAction never combines ways with overlapping segments? It would allow to remove a lot of code.
The alternative would be to write new classes and unit tests with different data structures, maybe with settings so that the user can decide wether actions should refuse to create invalid OSM data or not.

Last edited 15 months ago by GerdP (previous) (diff)

comment:16 by stoecker, 15 months ago

Hmm, combining overlapping ways is a long standing issue as far as I know. In the early days it was necessary because areas have used overlaps to make holes.

I think self-overlapping ways don't have any use in OSM nowadays, so we probably don't need to support them anymore. But when we reject to support them we must prevent a proper UI, so that the user gets feedback why combining is rejected.

E.g. I sometimes join boundaries into single ways for export and it is a nightmare to find the reasons why it fails. Usually I got step by step and when an error occurs do smaller steps.

What I probably would like in case of an error is that combine combines what it can and leaves the remains, so that error tracking gets easier, because majority of stuff was done.

comment:17 by stoecker, 15 months ago

Maybe a compromise could be to support overlapping ways only in special cases, e.g. when selecting two ways and not more?

comment:18 by GerdP, 15 months ago

so that the user gets feedback why combining is rejected.

I also think this should be done. After writing the last comment I understood that the code sometimes stops too early, so another patch will follow. Maybe we should add an info like "ways are not all connected" if NodeGraph.isConnected() returns false, or "too many terminal nodes" or "way would contain overlapping segments".

What I probably would like in case of an error is that combine combines what it can and leaves the remains, so that error tracking gets easier, because majority of stuff was done.

I used this trick: Create a temporary type=route relation with all the selected ways and use the sort in the relation editor. Helps to find the gaps (with jump to next gap). I wouldn't want to duplicate all that code in CombineWaysAction.

Maybe a compromise could be to support overlapping ways only in special cases, e.g. when selecting two ways and not more?

You probably think of the very simple case that 2 (or more) ways can be connected without splitting them? That would be code similar to the one for multipolygons, but without the restriction that they must form closed rings.

comment:19 by stoecker, 15 months ago

You probably think of the very simple case that 2 (or more) ways can be connected without splitting them? That would be code similar to the one for multipolygons, but without the restriction that they must form closed rings.

I mean for 2 ways you only need to check whether one of the end-points matches the end-point of the other way. Everything else can be ignored. That may result in strange results, but that way there is a chance to produce such strange results if they are needed.

comment:20 by GerdP, 15 months ago

Hmm, I wonder if the code in NodeGraph.buildSpanningPath() tries to implement Fleury's algorithm to find a Eulerian path https://en.wikipedia.org/wiki/Eulerian_path#Fleury's_algorithm without using recursion. It probably should, but the chosen data structures seem to be wrong. I have to dig deeper...

by GerdP, 15 months ago

Attachment: 22614_check_eulerian.patch added

comment:21 by GerdP, 15 months ago

Just some findings:
The patch implements code to check if the graph has an Eulerian path or Circuit. This fast and neccessary check was missing in the unpatched code, but I don't know if this will avoid the loop.
This page https://www.geeksforgeeks.org/eulerian-path-undirected-graph/ claims to contain efficient code to find the path, including java code.
I've not yet made up my mind whether we have to support directed graphs, and the mixture of directed and undirected which we use in createNearlyUndirectedGraphFromNodeWays()is not handled in the theory. I think we have to get rid of that method first.

by GerdP, 11 months ago

Attachment: 22614-alt-algo.patch added

WIP patch for alternative algorithm to find spanning path

comment:22 by GerdP, 11 months ago

I've last worked on this patch in January, so it is not really work in progress, but maybe it is useful in combination with #21881. The problem is that it sometimes finds different results compared to the original algo and it is hard to say which result is better. I think it would be best to find a graph that doesn't require any reversing of way segments.

comment:23 by gaben, 6 months ago

Summary: Combine ways is very slow[WIP patch] Combine ways is very slow

(I'm just putting the patch existence in the title)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain team.
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from team to the specified user.
Next status will be 'needinfo'. The owner will be changed from team to GerdP.
as duplicate The resolution will be set to duplicate. Next status will be 'closed'. The specified ticket will be cross-referenced with this ticket.
The owner will be changed from team to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.