Modify

Opened 10 years ago

Closed 7 years ago

#11382 closed enhancement (fixed)

[Patch] Improve MapCSS speed by using BitSets

Reported by: michael2402 Owned by: michael2402
Priority: normal Milestone:
Component: Core mappaint Version:
Keywords: mapcss performance Cc: Klumbumbus

Description

MapCSS is slowed down a lot because it needs to sort the list of possible rules for every element. We can improve this by sorting that list once (using a big list) and then using a BitSet to mask rules that could be applied to the current element.

Masking accuracy is improved by not only looking for key/value pairs, but also for required keys.

Attachments (4)

mapcss-use-bitmap-and-keycache.patch (12.2 KB ) - added by michael2402 10 years ago.
mapcss-use-bitmap-and-keycache-doc.patch (16.4 KB ) - added by michael2402 10 years ago.
mapcss-use-bitmap-and-keycache-size.patch (12.4 KB ) - added by anonymous 10 years ago.
mapcss-use-bitmap-and-keycache-size.2.patch (12.4 KB ) - added by michael2402 10 years ago.

Download all attachments as: .zip

Change History (37)

by michael2402, 10 years ago

comment:1 by Don-vip, 10 years ago

have you run some benchmarks to measure the performance gain?

comment:2 by michael2402, 10 years ago

Yes. I added some more styles to the MapCSSPerformanceTest (to get a common balance between key/value styles, key only styles and some other selectors) and run it. The styles added are:

  • Address Tags Validator
  • Coloured Streets
  • Cycleways
  • Lit Objects
  • Maxspeed
  • Parking lanes
  • Public Transport
  • Sidewalks
  • Surface - Data Entry
  • Surface
  • Tree-Crown_Diameter

I just copied the rules in a big file.

The time to run the test decreased from 4850ms to 4300ms

For the default style only the time decreased from 3300ms to 3000ms.

comment:3 by Don-vip, 10 years ago

Keywords: mapcss performance added
Milestone: 15.05

Nice :) Can you please add some Javadoc for new methods? This is a great enhancement!

by michael2402, 10 years ago

comment:4 by michael2402, 10 years ago

I added the new patch. I also fixed the old comment about how the index works (which was not correct any more).

comment:5 by Don-vip, 10 years ago

Resolution: fixed
Status: newclosed

In 8297/josm:

fix #11382 - Improve MapCSS speed by using BitSets (patch by michael2402)

comment:6 by bastiK, 10 years ago

Resolution: fixed
Status: closedreopened

Sorry for responding late, I had to find some time for testing first. Looking only at phase 1 for the default style, I got an improvement of about 20 % for the key cache and another 20 % for the bitmap. This is very impressive!

The key cache is a clear enhancement it fits very well with the current code.

Regarding the bitmap, I'm a little concerned about scalability. The current index should be O(1) in time and O(N) in space (with respect to the number of rules N). The bitmap solution is O(N) in time and O(N*N) in space.

This means for example, that the index for a style sheet with 50000 rules would take 50000*50000 bits = 312 MB of memory. Admittedly, 50000 rules is a lot. But we are getting there: The default style has 1400 rules and for example the "Lane details" style has 3900 rules.

comment:7 by Klumbumbus, 10 years ago

Cc: Klumbumbus added

comment:8 by michael2402, 10 years ago

Hi.

You are right, the upper bound for the bitmap size is pretty high.

The current index uses O(n log n) time (pull n entries form the queue), with n beeing the number of rules.

I did some more comparison on the number of bits used. I did not count the hash map entries, since they are used in both implementations.

I used the following object sizes (bytes):
BitSet => 24 (object) + 24 (array) + <highest used word> * 8
ArrayList => 24 + 24 + entries * 8
optimized BitSet => 28 + 24 + (<highest used word> - <lowest used word>) * 8
These sizes are optimal (you always get some empty elements at the end, but those could be trimmed), so the real size is higher by a constant/linear factor. We could optimize that size by skipping the first words of the bitset that are zero and not storing them, this is the third option.

This is the result of summing up all required lists (mixed for all primitive types):

JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss
The index has 91 keys with 710 cases
Old size: 39120, new size: 64992, new size optimized: 37388
JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss
The index has 50 keys with 602 cases
Old size: 31760, new size: 58800, new size optimized: 31612
JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss
The index has 34 keys with 199 cases
Old size: 11712, new size: 13256, new size optimized: 10480
JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss
The index has 3 keys with 16 cases
Old size: 816, new size: 920, new size optimized: 884
JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss
The index has 33 keys with 452 cases
Old size: 22448, new size: 36920, new size optimized: 23636
JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss
The index has 0 keys with 0 cases
Old size: 112, new size: 56, new size optimized: 60

The Mapnik style is the worst I found:

Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip
The index has 24 keys with 161 cases
Old size: 7904, new size: 11568, new size optimized: 8440
Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip
The index has 37 keys with 359 cases
Old size: 20480, new size: 64464, new size optimized: 18904
Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip
The index has 26 keys with 159 cases
Old size: 10560, new size: 27328, new size optimized: 8496
Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip
The index has 0 keys with 0 cases
Old size: 112, new size: 56, new size optimized: 60
Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip
The index has 24 keys with 233 cases
Old size: 11616, new size: 16368, new size optimized: 12200
Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip
The index has 0 keys with 0 cases
Old size: 112, new size: 56, new size optimized: 60

Lane features is a petty bad example for size comparison:

Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1
The index has 0 keys with 0 cases
Old size: 48, new size: 48, new size optimized: 52
Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1
The index has 22 keys with 54 cases
Old size: 249488, new size: 4168, new size optimized: 3916
Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1
The index has 22 keys with 54 cases
Old size: 249488, new size: 4168, new size optimized: 3916
Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1
The index has 0 keys with 0 cases
Old size: 48, new size: 48, new size optimized: 52
Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1
The index has 0 keys with 0 cases
Old size: 48, new size: 48, new size optimized: 52
Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1
The index has 0 keys with 0 cases
Old size: 48, new size: 48, new size optimized: 52

The reason is that the size does increase with the number of key/value pairs or keys used, not with the number of rules. So as long as your rules all use the same keys, your overall size will decrease. For the lane style, only 22 keys were identified. Of those, there were 32 values (and the rest is the default case). So all in all, you only need 54 BitSets for the cache.

All in all, it is a trade of speed for some more memory used. We could keep both implementations (since they almost have the same signature) to use it in such worst-case-scenarios. Or we could just care about this as soon as someone writes a useful worst-case style (e.g. by only splitting simple key value rules as soon as that key is used often and just keeping the key bitset in other cases). We might consider using that better BitSet.

comment:9 by bastiK, 10 years ago

A first optimization would be to not access the rules array directly in RuleCandidatesIterator, but to prepare a separate list for each index type. This way you are not wasting bit slots that aren't used anyway.

Of course this doesn't really help with the worst case scenario. (Which isn't really worst-case, but something I would expect to happen for normal use.)

One fix could be a "chunked index": The (meta-)index would accept rules until the rule number reaches, say, L=512. Then it will save this as a first chunk in a list. It then starts a new index, saving rule number r at bit position r-L. When this index is full (rule number 2*L is reached), it will start the third chunk and so on.

As far as I can tell, there should be no significant performance loss in time and it solves the explosion in memory use.

PS: It would help to have some code along with command line output as I sometimes cannot tell what you are printing exactly.

PPS: Still, this is brilliant work so far. I would not have guessed, that bitmap is the best solution here. :)

comment:10 by michael2402, 10 years ago

Hi

The code is basically:

            int cases = 0;
            int sizeUsedNew = 0;
            int sizeUsedOld = 0;
            int sizeUsedOpt = 0;
            for (MapCSSKeyRules v : index.values()) {
                cases += 1 + v.specialRules.size();
                for (BitSet r : v.specialRules.values()) {
                    sizeUsedNew += size(r);
                    BitSet bs = new BitSet();
                    bs.andNot(v.generalRules);
                    sizeUsedOpt += sizeOpt(bs);
                    sizeUsedOld += sizeAsList(bs);
                }
                sizeUsedNew += size(v.generalRules);
                sizeUsedOpt += sizeOpt(v.generalRules);
                sizeUsedOld += sizeAsList(v.generalRules);
            }
            sizeUsedNew += size(remaining);
            sizeUsedOld += sizeAsList(remaining);
            sizeUsedOpt += sizeOpt(remaining);
            System.out.println( title + " - " +url );
            System.out.println("The index has " + index.size() + " keys with " + cases + " cases");
            System.out.println("Old size: " + sizeUsedOld + ", new size: " + sizeUsedNew + ", new size optimized: " + sizeUsedOpt);

        private int size(BitSet bs) {
            return 24 + 24+ ((bs.length() + 63) / 64) * 8;
        }

        private int sizeOpt(BitSet bs) {
            return 24 + 28 + ((bs.length() + 63) / 64 - bs.nextSetBit(0) / 64) * 8;
        }

        private int sizeAsList(BitSet bs) {
            return 24 + 24 + bs.cardinality()* 64;
        }

There are multiple ways to address the memory size issue. The rules array is already per index. To address the size issue, we would need to examine the structure of the bitsets in more detail, which is not possible in a general case.

One solution would be to store an array of ints per entry (instead of a bitmap). That way, we only need as many entries as we have active bits. Merging can be done in O(n) by simply using BitSets.

We could also use a more generic approach (e.g. storing only indexes as long as we only have <= 3 values). Most key=value pairs only have few rules (icon definitions, ...). That way, we would not need a bitset for them but still have the good performance for the key=value pairs with many rules.

Your chunked index has the disadvantage, that one has table lookup per chunk is required. Since that is a pretty expensive part of the rule search, I would like to keep the chunks as big as possible. But it is a change that is relatively easy to implement.

The most important thing is to avoid sorting, since this involves a lot of compareTo() calls and pointer movement.

I also tried a k-way merge, but I did not get it as fast as bitmaps are (but at least faster than basic sorting). Have a look at it:
https://github.com/michaelzangl/josm-playground/blob/ea5aec8103e1b28905245cbbcc99646aed4f157f/src/org/openstreetmap/josm/gui/mappaint/mapcss/MapCSSStyleSource.java

My advice would be to accept the possibility of the memory explosion at the moment and keep in mind that that might happen some time and how we can solve it. For current styles, the memory footprint would even decrease in many cases and the additional memory size required is less than 100kiB even when the user has activated several styles.

in reply to:  10 comment:11 by bastiK, 10 years ago

Replying to michael2402:

Hi

The code is basically:

...

Thanks!

The rules array is already per index.

Yes, sorry, I got confused by the inner class.

To address the size issue, we would need to examine the structure of the bitsets in more detail, which is not possible in a general case.

One solution would be to store an array of ints per entry (instead of a bitmap). That way, we only need as many entries as we have active bits. Merging can be done in O(n) by simply using BitSets.

Sounds good.

We could also use a more generic approach (e.g. storing only indexes as long as we only have <= 3 values). Most key=value pairs only have few rules (icon definitions, ...). That way, we would not need a bitset for them but still have the good performance for the key=value pairs with many rules.

I'm not sure I understand this idea. Do you mean different data structure depending on the number of rules for a single key or key=value?

Your chunked index has the disadvantage, that one has table lookup per chunk is required. Since that is a pretty expensive part of the rule search, I would like to keep the chunks as big as possible.

I doubt that a HashMap lookup is expensive. In fact it should be extremely cheap. Try to put the lookup 10 times in a row, you won't see a measurable difference.

But it is a change that is relatively easy to implement.

With a larger chunk size, this will be practially no change for most of the existing styles. But it fixes the problems with very large style sheets, even if it comes (as you think) with a runtime cost for these particular styles.

I also tried a k-way merge, but I did not get it as fast as bitmaps are (but at least faster than basic sorting). Have a look at it:
https://github.com/michaelzangl/josm-playground/blob/ea5aec8103e1b28905245cbbcc99646aed4f157f/src/org/openstreetmap/josm/gui/mappaint/mapcss/MapCSSStyleSource.java

I would go with the fastest solution that doesn't have memory issues.

My advice would be to accept the possibility of the memory explosion at the moment and keep in mind that that might happen some time and how we can solve it. For current styles, the memory footprint would even decrease in many cases and the additional memory size required is less than 100kiB even when the user has activated several styles.

Nope, this is a regression and should be fixed. People come up with crazy stuff all the time... I'm sure we'll have a style with ~50000 rules pretty soon if it isn't used already in private.

comment:12 by Don-vip, 10 years ago

Summary: [Patch] Improve MapCSS speed by using BitSetsImprove MapCSS speed by using BitSets

comment:13 by michael2402, 10 years ago

Hi,

I took some time to write a patch that does not have the memory issues you discussed. I did not use a segmentation approach but instead used a mixture of bitsets and normal integer lists that has a size limit in O(<number of set bits>). I changed the key/value rules to not contain the key rules. This requires one additional or when merging, but it ensures that every rule only produces one single set bit in any of the rule bitsets.

I did not notice any performance changes, in theory there sould be imrpovements when having many rules (since bitset merging is faster).

Version 0, edited 10 years ago by michael2402 (next)

comment:14 by bastiK, 10 years ago

Looks nice!

  • RuleBitset.setBitsIn: I would expect a return after the first loop.
  • Even after adding the return, I cannot measure an improvement coming from RuleBitset.optimize. Could you check and possibly simplify the code if it isn't needed?
  • Why do are you using long[] instead of java.util.BitSet?
  • Please add javadoc for every new method (without @override) and class. I know the existing JOSM code is not fully documented, but we are trying to get there. :)

comment:15 by bastiK, 10 years ago

See #11479 for a bug that may be caused by this optimization.

comment:16 by Don-vip, 10 years ago

Michael, can you please provide a new version taking into account Paul's remarks? :)

comment:17 by michael2402, 10 years ago

I did not find any performance improvements in using this.

I will do some more performance tests during my GSoC and will then address the issue and evaltuate some alternatives. For now, the BitSet version is not slower than the version I posted in that patch.

The long[] is used, because a BitSet does not allow an offset. It is faster for MapCSS files that have many rules with the same priority for the same key (like [name~=...]). For the default style, there should be no performance gain and we could just use int[]s of rules that are included for this key.

in reply to:  17 comment:18 by bastiK, 10 years ago

Replying to michael2402:

I did not find any performance improvements in using this.

What are you referring to?

I will do some more performance tests during my GSoC and will then address the issue and evaltuate some alternatives. For now, the BitSet version is not slower than the version I posted in that patch.

We don't ask you to find alternatives, the code you submitted is fine for the most part. There are just minor issues, we would like you to fix, so the patch can be applied and this ticket closed:

  • My first point about return: Please confirm that this is a typo and upload a new version. In case this is not an error and intended like this, please explain why it is correct.
  • we still need Javadoc

The long[] is used, because a BitSet does not allow an offset.

Okay.

It is faster for MapCSS files that have many rules with the same priority for the same key (like [name~=...]).

Please provide a sample style where it is in fact faster. (Compared to the version where calls to RuleBitset.optimize are commented out.) Your hybrid solution is fine if there is a gain. Otherwise we could drop a lot of the more technical code and the patch could be simplified significantly. This would be easier to read for fellow developers and (probably) less error prone. :)

comment:19 by Don-vip, 10 years ago

Summary: Improve MapCSS speed by using BitSets[Patch] Improve MapCSS speed by using BitSets

comment:20 by Don-vip, 10 years ago

bastiK, is it ok to delay this to the next release?

comment:21 by bastiK, 10 years ago

Yes, it is already in r8339 (no complaints so far), so it should be okay to postpone for one more release.

comment:22 by Don-vip, 10 years ago

Milestone: 15.0515.06

Ok let's go then :)

comment:23 by Don-vip, 9 years ago

Milestone: 15.0615.07

skip milestone 15.06

comment:24 by Don-vip, 9 years ago

status of this?

comment:25 by Don-vip, 9 years ago

Milestone: 15.0715.08

Milestone renamed

comment:26 by Don-vip, 9 years ago

Milestone: 15.0815.09

comment:27 by simon04, 9 years ago

Milestone: 15.0915.10

Postpone enhancements to next release.

comment:28 by simon04, 9 years ago

Owner: changed from team to michael2402
Status: reopenedneedinfo

What is the status of this ticket/patch?

comment:29 by simon04, 9 years ago

Milestone: 15.10

Unschedule tickets without progress

comment:30 by michael2402, 9 years ago

My current problem is that I was not able to accelerate this further.

Currently, the main computing power is used for (using the MapCSSStyleSourceFilterTest#testKeyValueRules):

  • Traversing the list of rules (15%)
  • Hashtable lookup for key and value (approx. 25% each, total 50%). Can be reduced using IdentityHashTable. We would need to enforce interning of all entries in the OsmPrimitive#keys array for this. Changing this should make this part 5 times faster than it is now.
  • The real apply/matches/..-calls (15%)

What I did achieve is lowering the upper memory bound to O(n) (instead O(n*n)) at runtime, but O(n*n) for creating the index (in theory, this can also be lowered).

comment:31 by GerdP, 8 years ago

If I got that right, this patch was applied with r8297 ?

comment:32 by michael2402, 8 years ago

Yes, the patch is in core for about one year now. The only issue remaining is that the theoretical memory bound is O(n*n) now. In practice, you won't notice it since n is relatively small and we are talking about bits here ;-). It could be reduced by using integer lists instead of bit sets.

comment:33 by stoecker, 7 years ago

Resolution: fixed
Status: needinfoclosed

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain michael2402.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.