Modify

Opened 5 years ago

Closed 5 years ago

#11833 closed defect (fixed)

Deadlocking while loading a lot of data

Reported by: naoliv Owned by: team
Priority: normal Milestone: 15.09
Component: Core Version:
Keywords: Cc:

Description

It seems that JOSM is deadlocking while loading a lot of data.

For example, trying to load all boundary=administrative in Brazil (this file http://naoliv.iq.unesp.br/osm/boundaries.osm.bz2 - size is 21M).

Until some time ago I used to see JOSM freezing sometimes while trying to load it. But most of the time it was able to open it.

Now it is always freezing here.
Two CPU cores at 100% and with strace it's possible to see that it's stopped at this:

futex(0x7f4dcce979d0, FUTEX_WAIT, 25833, NULL%

Stopped in futex always makes me think that somehow the program is deadlocking.

Is there anything that I can do to help debugging this?

JOSM:

Build-Date: 2015-09-04 21:58:31
Revision: 8730
Is-Local-Build: true

Identification: JOSM/1.5 (8730 SVN pt_BR) Linux Debian GNU/Linux unstable (sid)
Memory Usage: 445 MB / 5461 MB (207 MB allocated, but free)
Java version: 1.7.0_79, Oracle Corporation, OpenJDK 64-Bit Server VM
Java package: openjdk-7-jre:amd64-7u79-2.5.6-1
VM arguments: [-Dawt.useSystemAAFontSettings=on]

Plugins:
- AddrInterpolation (31241)
- Create_grid_of_ways (31241)
- FastDraw (31265)
- FixAddresses (31241)
- OpeningHoursEditor (31241)
- PicLayer (31241)
- SimplifyArea (31241)
- buildings_tools (31361)
- download_along (31241)
- editgpx (31241)
- geotools (31126)
- graphview (31241)
- jts (31126)
- measurement (31289)
- merge-overlap (31241)
- opendata (31241)
- pdfimport (31241)
- poly (31241)
- reverter (31241)
- tagging-preset-tester (31241)
- todo (29154)
- turnrestrictions (31241)
- undelete (31241)
- utilsplugin2 (31463)

Attachments (2)

stack.txt (34.7 KB) - added by naoliv 5 years ago.
named-stack.txt (31.8 KB) - added by naoliv 5 years ago.
New jstack with the named threads

Download all attachments as: .zip

Change History (16)

comment:1 Changed 5 years ago by simon04

For me, loading and painting worked w/o problems, but was slow:

Identification: JOSM/1.5 (8731 SVN en) Linux Arch Linux
Memory Usage: 1433 MB / 3504 MB (190 MB allocated, but free)
Java version: 1.7.0_85, Oracle Corporation, OpenJDK 64-Bit Server VM

To obtain more debug information, you can use jstack <pid> to obtain a full stack trace.

Changed 5 years ago by naoliv

Attachment: stack.txt added

comment:2 Changed 5 years ago by naoliv

Good to know about jstack.
And I don't know Java, but taking a look at things like this makes me think that indeed there is some kind of deadlock here:

"pool-7-thread-1" prio=10 tid=0x00007f415ca88800 nid=0x2681 waiting on condition [0x00007f41178cc000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000681bb89d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

The full stack is attached.

comment:3 Changed 5 years ago by wiktorn

Shouldn't we use ConcurrentHashMap in org.openstreetmap.josm.data.osm.visitor.paint.relations.MultipolygonCache? Looks like multiple threads were trying to resize the HashMap

comment:4 in reply to:  3 ; Changed 5 years ago by simon04

Replying to naoliv:

And I don't know Java, but taking a look at things like this makes me think that indeed there is some kind of deadlock here:

Basically, those executors are waiting for new tasks to execute. I don't think this is related to the blocking issue. Since r8734 and r8736, instead of "pool-7-thread-1" a more sensible thread name is used.

Replying to wiktorn:

Shouldn't we use ConcurrentHashMap in org.openstreetmap.josm.data.osm.visitor.paint.relations.MultipolygonCache? Looks like multiple threads were trying to resize the HashMap

Using the non-synchronized HashMap shouldn't cause blocking issues since inserts/gets are performed immediately. Depending on the context, this can cause undesired effects (e.g., creating/inserting an element twice), but doesn't cause deadlocks. Am I wrong?

Changed 5 years ago by naoliv

Attachment: named-stack.txt added

New jstack with the named threads

comment:5 in reply to:  4 Changed 5 years ago by wiktorn

Replying to simon04:

Replying to wiktorn:

Shouldn't we use ConcurrentHashMap in org.openstreetmap.josm.data.osm.visitor.paint.relations.MultipolygonCache? Looks like multiple threads were trying to resize the HashMap

Using the non-synchronized HashMap shouldn't cause blocking issues since inserts/gets are performed immediately. Depending on the context, this can cause undesired effects (e.g., creating/inserting an element twice), but doesn't cause deadlocks. Am I wrong?

Checked the code, and didn't saw any obvious way, that could case deadlock. But as there are no barriers preventing reordering, you never know, what can happen. Found this post a bit relevant: http://jeremymanson.blogspot.com/2008/04/immutability-in-java.html Our case may too work, but there is nothing that prevents it from working. Cliff Click from Azul Systems is constantly bragging that Java programmers rely too much on Oracle JVM implementation, that's not that concurrent, as it could be. And as he is writing alternative JVM he has gone through many wild areas I guess.

@naoliv
OTOH, I also couldn't reproduce the problem. The download link was broken, but I've generated 300MB file using following overpass query:

[out:xml]
[timeout:600]
;
area
  ["boundary"="administrative"]
  ["admin_level"="2"]
  ["name"="Brasil"]
  ["type"="boundary"]
->.boundryarea;
(
  relation
    (area.boundryarea)
	["boundary"="administrative"]
);
out meta bb qt;
>;
out meta qt;

This loads fine, uses around 500MB of memory and fits quite well in 1GB JVM heap. Maybe you can try also to monitor your process with visualvm (https://visualvm.java.net/) and check JVM/JOSM is not stuck in Garbage Collection process (in monitor tab, blue line on CPU usage is GC activity). It might suggest, that you should've allocate more memory to JOSM.

comment:6 Changed 5 years ago by wiktorn

@simon04:

And maybe this SO answers:
http://stackoverflow.com/questions/104184/is-it-safe-to-get-values-from-a-java-util-hashmap-from-multiple-threads-no-modi

There is such comment:

One note is that under some circumstances, a get() from an unsynchronized HashMap can cause an infinite loop. This can occur if a concurrent put() causes a rehash of the Map.
http://lightbody.net/blog/2005/07/hashmapget_can_cause_an_infini.html

Though the link is dead for me ATM.

This was the situation observed in first stack trace

And this SO is even better:
http://stackoverflow.com/questions/13695832/explain-the-timing-causing-hashmap-put-to-execute-an-infinite-loop

Last edited 5 years ago by wiktorn (previous) (diff)

comment:7 Changed 5 years ago by naoliv

Sorry. I am moving my machine right now. Probably it will be back tomorrow.

I will test it later and get back you.

comment:8 Changed 5 years ago by naoliv

Using this overpass query:

[out:xml];
{{geocodeArea:brazil}}->.searchArea;
(
  node["boundary"="administrative"](area.searchArea);
  way["boundary"="administrative"](area.searchArea);
  relation["boundary"="administrative"](area.searchArea);
);
out meta;
>;
out meta qt;

it doesn't seem to be a problem with GC (I am running JOSM with -Xms256M -Xmx6g):

http://i.imgur.com/OA4NJWw.png

The heap usage keeps increasing and I can see 4 cores running at 100% too.

comment:9 in reply to:  6 ; Changed 5 years ago by simon04

Replying to wiktorn:

One note is that under some circumstances, a get() from an unsynchronized HashMap can cause an infinite loop. This can occur if a concurrent put() causes a rehash of the Map.


http://stackoverflow.com/questions/13695832/explain-the-timing-causing-hashmap-put-to-execute-an-infinite-loop

How wrong could I be. :) Thank you a lot for the pointers. This SO links to the especially interesting blog http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html describing how it might come to this infinite loop …

Now I agree to your suggestion from comment:3. Do you think, using JCS makes any sense for the multipolygon context?

comment:10 in reply to:  9 Changed 5 years ago by wiktorn

Replying to simon04:

Now I agree to your suggestion from comment:3. Do you think, using JCS makes any sense for the multipolygon context?

As a drop-in replacement for HashMap - no. ConcurrentHashMap will suffice. Keep in mind, that ConcurrentHashMap, contrary to HashMap, doesn't allow null values. I haven't checked if it's the case here.

What could make a difference, is when you would like to have LRU cache here.

comment:11 Changed 5 years ago by simon04

In 8739/josm:

see #11833 - Attempt to fix infinite loop in MultipolygonCache (thanks to wiktorn)

This might occur on concurrent put operations.

comment:12 Changed 5 years ago by simon04

Milestone: 15.09

comment:13 Changed 5 years ago by naoliv

Very good!
It's working nicely now.

comment:14 Changed 5 years ago by simon04

Resolution: fixed
Status: newclosed

Good to know. Thank you for providing a lot of debugging information and to wiktorn for the helpful links :)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.