#11658 closed enhancement (needinfo)
[patch] doubles parsing
Reported by: | shinigami | Owned by: | shinigami |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core | Version: | latest |
Keywords: | Cc: |
Description
Double.parseDouble makes more than 15% of garbage in my test, when parsing osm file.
This patch is experimental replacement. It can parse simply writen doubles only, no NaNs, no doubles with E..., just numbers like 123.456789 and so. But if I understand well, it is enough for parsing osm files. It generates no garbage and seems to be ~3x faster than generic Double.parseDouble.
Attachments (1)
Change History (11)
by , 9 years ago
Attachment: | parsedouble.patch added |
---|
comment:1 by , 9 years ago
Primitives is not a good name, as this is used for node, way, relation in JOSM. Didn't check the general idea of the patch.
comment:2 by , 9 years ago
Summary: | doubles parsing → [patch] doubles parsing |
---|
I feel that we should be able to parse scientific notation (E...) as I expect this to be in the output of third party tools than create .osm files. (Import scripts and the like.)
Also we should have unit test to prove that the output of your optimized code is bit-identical to Double.parseDouble().
Could you explain why it is a problem to have "garbage", as it gets collected and the memory is re-used right away? What is the overall gain in runtime when loading an osm file?
Btw., your username makes me a little uneasy... ;)
comment:3 by , 9 years ago
Replying to shinigami:
seems to be ~3x faster than generic Double.parseDouble.
from which JDK? There was a big performance issue with JDK7 but it seems to have been fixed in JDK8.
follow-up: 5 comment:4 by , 9 years ago
Don-vip: jdk8u45, it still delegates parsing to some internal classes, creates buffers for it and so.
bastiK: aww, I have looked to osm file description briefly and understood it as it can contain this simple format only:/.
Problem with garbage is it must be collected;) I started with this patches because of #11492. I can't open that file at my hw (have 16GB only), so I always sample first minute of parsing. Half of time it spends collecting garbage.
comment:5 by , 9 years ago
Replying to shinigami:
bastiK: aww, I have looked to osm file description briefly and understood it as it can contain this simple format only:/.
You could first attempt to parse it in a simple way. Then, if that failed, try again with Double.parseDouble()
. This should be both fast and reliable, right?
Problem with garbage is it must be collected;) I started with this patches because of #11492. I can't open that file at my hw (have 16GB only), so I always sample first minute of parsing. Half of time it spends collecting garbage.
So basically it does not affect any memory constraints, but speeds up the process? By how much? What tools do you use to investigate?
Btw., your effort to optimize the loading of large .osm files is highly appreciated. :)
comment:6 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → needinfo |
comment:7 by , 9 years ago
Resolution: | → needinfo |
---|---|
Status: | needinfo → closed |
comment:8 by , 3 years ago
https://github.com/wrandelshofer/FastDoubleParser (MIT License; 28 KB) claims to offer a 4x speed-up over Double.parseDouble
parsing numbers in file data/canada.txt read 111126 lines === warmup 1000 times ===== === number of trials 32 ===== FastDoubleParser MB/s avg: 341.740221, min: 303.93, max: 366.63 FastDoubleParserFromByteArray MB/s avg: 460.556911, min: 405.39, max: 481.79 Double MB/s avg: 80.854956, min: 68.74, max: 85.77 Speedup FastDoubleParser vs Double: 4.226583 Speedup FastDoubleParserFromByteArray vs Double: 5.696088
org.openstreetmap.josm.io.OsmReaderPerformanceTest#testPlain
on nodist/data/neubrandenburg.osm.bz2
-<measurement><name>load .osm-file 4 times(ms)</name><value>2851.0</value></measurement> +<measurement><name>load .osm-file 4 times(ms)</name><value>2664.0</value></measurement>
org.openstreetmap.josm.io.OsmReaderPerformanceTest#testCompressed
on montenegro-latest.osm.bz2 (44M)
-<measurement><name>load compressed (.osm.bz2) 4 times(ms)</name><value>69327.0</value></measurement> +<measurement><name>load compressed (.osm.bz2) 4 times(ms)</name><value>69156.0</value></measurement>
My conclusion: rather negligible effects for parsing OSM XML data.
-
ivy.xml
diff --git a/ivy.xml b/ivy.xml index f56c89a92..c843f6dea 100644
a b 31 31 <dependency conf="api->default" org="ch.poole" name="OpeningHoursParser" rev="0.23.2"/> 32 32 <dependency conf="api->default" org="oauth.signpost" name="signpost-core" rev="2.0.0"/> 33 33 <dependency conf="api->default" org="org.webjars.npm" name="tag2link" rev="2021.3.21"/> 34 <dependency conf="api->default" org="ch.randelshofer" name="fastdoubleparser" rev="0.3.0"/> 34 35 <!-- sources->sources --> 35 36 <dependency conf="sources->sources" org="org.openstreetmap.jmapviewer" name="jmapviewer" rev="2.15"/> 36 37 <dependency conf="sources->sources" org="javax.json" name="javax.json-api" rev="1.1.4"/> -
src/org/openstreetmap/josm/io/AbstractParser.java
diff --git a/src/org/openstreetmap/josm/io/AbstractParser.java b/src/org/openstreetmap/josm/io/AbstractParser.java index f214cd004..b725f775b 100644
a b import static org.openstreetmap.josm.tools.I18n.tr; 5 5 6 6 import java.util.Date; 7 7 8 import ch.randelshofer.fastdoubleparser.FastDoubleParser; 8 9 import org.openstreetmap.josm.data.coor.LatLon; 9 10 import org.openstreetmap.josm.data.osm.OsmPrimitiveType; 10 11 import org.openstreetmap.josm.data.osm.RelationMemberData; … … public abstract class AbstractParser extends DefaultHandler { 80 81 } 81 82 double d = 0.0; 82 83 try { 83 d = Double.parseDouble(v);84 d = FastDoubleParser.parseDouble(v); 84 85 } catch (NumberFormatException e) { 85 86 throwException(tr("Illegal value for attribute ''{0}'' of type double. Got ''{1}''.", name, v), e); 86 87 }
comment:9 by , 3 years ago
I would prefer to wait for a JDK enhancement rather than adding a new lib just for that.
patch