Opened 9 years ago
Closed 9 years ago
#11833 closed defect (fixed)
Deadlocking while loading a lot of data
Reported by: | naoliv | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | 15.09 |
Component: | Core | Version: | |
Keywords: | Cc: |
Description
It seems that JOSM is deadlocking while loading a lot of data.
For example, trying to load all boundary=administrative
in Brazil (this file http://naoliv.iq.unesp.br/osm/boundaries.osm.bz2 - size is 21M).
Until some time ago I used to see JOSM freezing sometimes while trying to load it. But most of the time it was able to open it.
Now it is always freezing here.
Two CPU cores at 100% and with strace it's possible to see that it's stopped at this:
futex(0x7f4dcce979d0, FUTEX_WAIT, 25833, NULL%
Stopped in futex always makes me think that somehow the program is deadlocking.
Is there anything that I can do to help debugging this?
JOSM:
Build-Date: 2015-09-04 21:58:31 Revision: 8730 Is-Local-Build: true Identification: JOSM/1.5 (8730 SVN pt_BR) Linux Debian GNU/Linux unstable (sid) Memory Usage: 445 MB / 5461 MB (207 MB allocated, but free) Java version: 1.7.0_79, Oracle Corporation, OpenJDK 64-Bit Server VM Java package: openjdk-7-jre:amd64-7u79-2.5.6-1 VM arguments: [-Dawt.useSystemAAFontSettings=on] Plugins: - AddrInterpolation (31241) - Create_grid_of_ways (31241) - FastDraw (31265) - FixAddresses (31241) - OpeningHoursEditor (31241) - PicLayer (31241) - SimplifyArea (31241) - buildings_tools (31361) - download_along (31241) - editgpx (31241) - geotools (31126) - graphview (31241) - jts (31126) - measurement (31289) - merge-overlap (31241) - opendata (31241) - pdfimport (31241) - poly (31241) - reverter (31241) - tagging-preset-tester (31241) - todo (29154) - turnrestrictions (31241) - undelete (31241) - utilsplugin2 (31463)
Attachments (2)
Change History (16)
comment:1 by , 9 years ago
by , 9 years ago
comment:2 by , 9 years ago
Good to know about jstack
.
And I don't know Java, but taking a look at things like this makes me think that indeed there is some kind of deadlock here:
"pool-7-thread-1" prio=10 tid=0x00007f415ca88800 nid=0x2681 waiting on condition [0x00007f41178cc000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000681bb89d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
The full stack is attached.
follow-up: 4 comment:3 by , 9 years ago
Shouldn't we use ConcurrentHashMap in org.openstreetmap.josm.data.osm.visitor.paint.relations.MultipolygonCache
? Looks like multiple threads were trying to resize the HashMap
follow-up: 5 comment:4 by , 9 years ago
Replying to naoliv:
And I don't know Java, but taking a look at things like this makes me think that indeed there is some kind of deadlock here:
Basically, those executors are waiting for new tasks to execute. I don't think this is related to the blocking issue. Since r8734 and r8736, instead of "pool-7-thread-1" a more sensible thread name is used.
Replying to wiktorn:
Shouldn't we use ConcurrentHashMap in
org.openstreetmap.josm.data.osm.visitor.paint.relations.MultipolygonCache
? Looks like multiple threads were trying to resize the HashMap
Using the non-synchronized HashMap
shouldn't cause blocking issues since inserts/gets are performed immediately. Depending on the context, this can cause undesired effects (e.g., creating/inserting an element twice), but doesn't cause deadlocks. Am I wrong?
comment:5 by , 9 years ago
Replying to simon04:
Replying to wiktorn:
Shouldn't we use ConcurrentHashMap in
org.openstreetmap.josm.data.osm.visitor.paint.relations.MultipolygonCache
? Looks like multiple threads were trying to resize the HashMap
Using the non-synchronized
HashMap
shouldn't cause blocking issues since inserts/gets are performed immediately. Depending on the context, this can cause undesired effects (e.g., creating/inserting an element twice), but doesn't cause deadlocks. Am I wrong?
Checked the code, and didn't saw any obvious way, that could case deadlock. But as there are no barriers preventing reordering, you never know, what can happen. Found this post a bit relevant: http://jeremymanson.blogspot.com/2008/04/immutability-in-java.html Our case may too work, but there is nothing that prevents it from working. Cliff Click from Azul Systems is constantly bragging that Java programmers rely too much on Oracle JVM implementation, that's not that concurrent, as it could be. And as he is writing alternative JVM he has gone through many wild areas I guess.
@naoliv
OTOH, I also couldn't reproduce the problem. The download link was broken, but I've generated 300MB file using following overpass query:
[out:xml] [timeout:600] ; area ["boundary"="administrative"] ["admin_level"="2"] ["name"="Brasil"] ["type"="boundary"] ->.boundryarea; ( relation (area.boundryarea) ["boundary"="administrative"] ); out meta bb qt; >; out meta qt;
This loads fine, uses around 500MB of memory and fits quite well in 1GB JVM heap. Maybe you can try also to monitor your process with visualvm (https://visualvm.java.net/) and check JVM/JOSM is not stuck in Garbage Collection process (in monitor tab, blue line on CPU usage is GC activity). It might suggest, that you should've allocate more memory to JOSM.
follow-up: 9 comment:6 by , 9 years ago
@simon04:
And maybe this SO answers:
http://stackoverflow.com/questions/104184/is-it-safe-to-get-values-from-a-java-util-hashmap-from-multiple-threads-no-modi
There is such comment:
One note is that under some circumstances, a get() from an unsynchronized HashMap can cause an infinite loop. This can occur if a concurrent put() causes a rehash of the Map.
http://lightbody.net/blog/2005/07/hashmapget_can_cause_an_infini.html
Though the link is dead for me ATM.
This was the situation observed in first stack trace
And this SO is even better:
http://stackoverflow.com/questions/13695832/explain-the-timing-causing-hashmap-put-to-execute-an-infinite-loop
comment:7 by , 9 years ago
Sorry. I am moving my machine right now. Probably it will be back tomorrow.
I will test it later and get back you.
comment:8 by , 9 years ago
Using this overpass query:
[out:xml]; {{geocodeArea:brazil}}->.searchArea; ( node["boundary"="administrative"](area.searchArea); way["boundary"="administrative"](area.searchArea); relation["boundary"="administrative"](area.searchArea); ); out meta; >; out meta qt;
it doesn't seem to be a problem with GC (I am running JOSM with -Xms256M -Xmx6g
):
The heap usage keeps increasing and I can see 4 cores running at 100% too.
follow-up: 10 comment:9 by , 9 years ago
Replying to wiktorn:
One note is that under some circumstances, a get() from an unsynchronized HashMap can cause an infinite loop. This can occur if a concurrent put() causes a rehash of the Map.
How wrong could I be. :) Thank you a lot for the pointers. This SO links to the especially interesting blog http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html describing how it might come to this infinite loop …
Now I agree to your suggestion from comment:3. Do you think, using JCS makes any sense for the multipolygon context?
comment:10 by , 9 years ago
Replying to simon04:
Now I agree to your suggestion from comment:3. Do you think, using JCS makes any sense for the multipolygon context?
As a drop-in replacement for HashMap
- no. ConcurrentHashMap
will suffice. Keep in mind, that ConcurrentHashMap
, contrary to HashMap, doesn't allow null values. I haven't checked if it's the case here.
What could make a difference, is when you would like to have LRU cache here.
comment:12 by , 9 years ago
Milestone: | → 15.09 |
---|
comment:14 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Good to know. Thank you for providing a lot of debugging information and to wiktorn for the helpful links :)
For me, loading and painting worked w/o problems, but was slow:
To obtain more debug information, you can use
jstack <pid>
to obtain a full stack trace.