Modify

Opened 9 years ago

Closed 9 years ago

Last modified 3 years ago

#11658 closed enhancement (needinfo)

[patch] doubles parsing

Reported by: shinigami Owned by: shinigami
Priority: normal Milestone:
Component: Core Version: latest
Keywords: Cc:

Description

Double.parseDouble makes more than 15% of garbage in my test, when parsing osm file.

This patch is experimental replacement. It can parse simply writen doubles only, no NaNs, no doubles with E..., just numbers like 123.456789 and so. But if I understand well, it is enough for parsing osm files. It generates no garbage and seems to be ~3x faster than generic Double.parseDouble.

Attachments (1)

parsedouble.patch (3.0 KB ) - added by shinigami 9 years ago.
patch

Download all attachments as: .zip

Change History (11)

by shinigami, 9 years ago

Attachment: parsedouble.patch added

patch

comment:1 by stoecker, 9 years ago

Primitives is not a good name, as this is used for node, way, relation in JOSM. Didn't check the general idea of the patch.

comment:2 by bastiK, 9 years ago

Summary: doubles parsing[patch] doubles parsing

I feel that we should be able to parse scientific notation (E...) as I expect this to be in the output of third party tools than create .osm files. (Import scripts and the like.)

Also we should have unit test to prove that the output of your optimized code is bit-identical to Double.parseDouble().

Could you explain why it is a problem to have "garbage", as it gets collected and the memory is re-used right away? What is the overall gain in runtime when loading an osm file?

Btw., your username makes me a little uneasy... ;)

in reply to:  description comment:3 by Don-vip, 9 years ago

Replying to shinigami:

seems to be ~3x faster than generic Double.parseDouble.

from which JDK? There was a big performance issue with JDK7 but it seems to have been fixed in JDK8.

comment:4 by shinigami, 9 years ago

Don-vip: jdk8u45, it still delegates parsing to some internal classes, creates buffers for it and so.

bastiK: aww, I have looked to osm file description briefly and understood it as it can contain this simple format only:/.

Problem with garbage is it must be collected;) I started with this patches because of #11492. I can't open that file at my hw (have 16GB only), so I always sample first minute of parsing. Half of time it spends collecting garbage.

in reply to:  4 comment:5 by bastiK, 9 years ago

Replying to shinigami:

bastiK: aww, I have looked to osm file description briefly and understood it as it can contain this simple format only:/.

You could first attempt to parse it in a simple way. Then, if that failed, try again with Double.parseDouble(). This should be both fast and reliable, right?

Problem with garbage is it must be collected;) I started with this patches because of #11492. I can't open that file at my hw (have 16GB only), so I always sample first minute of parsing. Half of time it spends collecting garbage.

So basically it does not affect any memory constraints, but speeds up the process? By how much? What tools do you use to investigate?

Btw., your effort to optimize the loading of large .osm files is highly appreciated. :)

comment:6 by simon04, 9 years ago

Owner: changed from team to shinigami
Status: newneedinfo

comment:7 by simon04, 9 years ago

Resolution: needinfo
Status: needinfoclosed

comment:8 by simon04, 3 years ago

https://github.com/wrandelshofer/FastDoubleParser (MIT License; 28 KB) claims to offer a 4x speed-up over Double.parseDouble

parsing numbers in file data/canada.txt
read 111126 lines
=== warmup 1000 times =====
=== number of trials 32 =====
FastDoubleParser               MB/s avg: 341.740221, min: 303.93, max: 366.63
FastDoubleParserFromByteArray  MB/s avg: 460.556911, min: 405.39, max: 481.79
Double                         MB/s avg: 80.854956, min: 68.74, max: 85.77
Speedup FastDoubleParser              vs Double: 4.226583
Speedup FastDoubleParserFromByteArray vs Double: 5.696088

org.openstreetmap.josm.io.OsmReaderPerformanceTest#testPlain on nodist/data/neubrandenburg.osm.bz2

-<measurement><name>load .osm-file 4 times(ms)</name><value>2851.0</value></measurement>
+<measurement><name>load .osm-file 4 times(ms)</name><value>2664.0</value></measurement>

org.openstreetmap.josm.io.OsmReaderPerformanceTest#testCompressed on montenegro-latest.osm.bz2 (44M)

-<measurement><name>load compressed (.osm.bz2) 4 times(ms)</name><value>69327.0</value></measurement>
+<measurement><name>load compressed (.osm.bz2) 4 times(ms)</name><value>69156.0</value></measurement>

My conclusion: rather negligible effects for parsing OSM XML data.


  • ivy.xml

    diff --git a/ivy.xml b/ivy.xml
    index f56c89a92..c843f6dea 100644
    a b  
    3131        <dependency conf="api->default" org="ch.poole" name="OpeningHoursParser" rev="0.23.2"/>
    3232        <dependency conf="api->default" org="oauth.signpost" name="signpost-core" rev="2.0.0"/>
    3333        <dependency conf="api->default" org="org.webjars.npm" name="tag2link" rev="2021.3.21"/>
     34        <dependency conf="api->default" org="ch.randelshofer" name="fastdoubleparser" rev="0.3.0"/>
    3435        <!-- sources->sources -->
    3536        <dependency conf="sources->sources" org="org.openstreetmap.jmapviewer" name="jmapviewer" rev="2.15"/>
    3637        <dependency conf="sources->sources" org="javax.json" name="javax.json-api" rev="1.1.4"/>
  • src/org/openstreetmap/josm/io/AbstractParser.java

    diff --git a/src/org/openstreetmap/josm/io/AbstractParser.java b/src/org/openstreetmap/josm/io/AbstractParser.java
    index f214cd004..b725f775b 100644
    a b import static org.openstreetmap.josm.tools.I18n.tr;  
    55
    66import java.util.Date;
    77
     8import ch.randelshofer.fastdoubleparser.FastDoubleParser;
    89import org.openstreetmap.josm.data.coor.LatLon;
    910import org.openstreetmap.josm.data.osm.OsmPrimitiveType;
    1011import org.openstreetmap.josm.data.osm.RelationMemberData;
    public abstract class AbstractParser extends DefaultHandler {  
    8081        }
    8182        double d = 0.0;
    8283        try {
    83             d = Double.parseDouble(v);
     84            d = FastDoubleParser.parseDouble(v);
    8485        } catch (NumberFormatException e) {
    8586            throwException(tr("Illegal value for attribute ''{0}'' of type double. Got ''{1}''.", name, v), e);
    8687        }

comment:9 by Don-vip, 3 years ago

I would prefer to wait for a JDK enhancement rather than adding a new lib just for that.

comment:10 by GerdP, 3 years ago

And it seems to require Java 11?

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain shinigami.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.