Opened 10 years ago
Closed 7 years ago
#11382 closed enhancement (fixed)
[Patch] Improve MapCSS speed by using BitSets
Reported by: | michael2402 | Owned by: | michael2402 |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core mappaint | Version: | |
Keywords: | mapcss performance | Cc: | Klumbumbus |
Description
MapCSS is slowed down a lot because it needs to sort the list of possible rules for every element. We can improve this by sorting that list once (using a big list) and then using a BitSet to mask rules that could be applied to the current element.
Masking accuracy is improved by not only looking for key/value pairs, but also for required keys.
Attachments (4)
Change History (37)
by , 10 years ago
Attachment: | mapcss-use-bitmap-and-keycache.patch added |
---|
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Yes. I added some more styles to the MapCSSPerformanceTest (to get a common balance between key/value styles, key only styles and some other selectors) and run it. The styles added are:
- Address Tags Validator
- Coloured Streets
- Cycleways
- Lit Objects
- Maxspeed
- Parking lanes
- Public Transport
- Sidewalks
- Surface - Data Entry
- Surface
- Tree-Crown_Diameter
I just copied the rules in a big file.
The time to run the test decreased from 4850ms to 4300ms
For the default style only the time decreased from 3300ms to 3000ms.
comment:3 by , 10 years ago
Keywords: | mapcss performance added |
---|---|
Milestone: | → 15.05 |
Nice :) Can you please add some Javadoc for new methods? This is a great enhancement!
by , 10 years ago
Attachment: | mapcss-use-bitmap-and-keycache-doc.patch added |
---|
comment:4 by , 10 years ago
I added the new patch. I also fixed the old comment about how the index works (which was not correct any more).
comment:6 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Sorry for responding late, I had to find some time for testing first. Looking only at phase 1 for the default style, I got an improvement of about 20 % for the key cache and another 20 % for the bitmap. This is very impressive!
The key cache is a clear enhancement it fits very well with the current code.
Regarding the bitmap, I'm a little concerned about scalability. The current index should be O(1) in time and O(N) in space (with respect to the number of rules N). The bitmap solution is O(N) in time and O(N*N) in space.
This means for example, that the index for a style sheet with 50000 rules would take 50000*50000 bits = 312 MB of memory. Admittedly, 50000 rules is a lot. But we are getting there: The default style has 1400 rules and for example the "Lane details" style has 3900 rules.
comment:7 by , 10 years ago
Cc: | added |
---|
comment:8 by , 10 years ago
Hi.
You are right, the upper bound for the bitmap size is pretty high.
The current index uses O(n log n) time (pull n entries form the queue), with n beeing the number of rules.
I did some more comparison on the number of bits used. I did not count the hash map entries, since they are used in both implementations.
I used the following object sizes (bytes):
BitSet => 24 (object) + 24 (array) + <highest used word> * 8
ArrayList => 24 + 24 + entries * 8
optimized BitSet => 28 + 24 + (<highest used word> - <lowest used word>) * 8
These sizes are optimal (you always get some empty elements at the end, but those could be trimmed), so the real size is higher by a constant/linear factor. We could optimize that size by skipping the first words of the bitset that are zero and not storing them, this is the third option.
This is the result of summing up all required lists (mixed for all primitive types):
JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss The index has 91 keys with 710 cases Old size: 39120, new size: 64992, new size optimized: 37388 JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss The index has 50 keys with 602 cases Old size: 31760, new size: 58800, new size optimized: 31612 JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss The index has 34 keys with 199 cases Old size: 11712, new size: 13256, new size optimized: 10480 JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss The index has 3 keys with 16 cases Old size: 816, new size: 920, new size optimized: 884 JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss The index has 33 keys with 452 cases Old size: 22448, new size: 36920, new size optimized: 23636 JOSM default (MapCSS) - resource://styles/standard/elemstyles.mapcss The index has 0 keys with 0 cases Old size: 112, new size: 56, new size optimized: 60
The Mapnik style is the worst I found:
Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip The index has 24 keys with 161 cases Old size: 7904, new size: 11568, new size optimized: 8440 Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip The index has 37 keys with 359 cases Old size: 20480, new size: 64464, new size optimized: 18904 Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip The index has 26 keys with 159 cases Old size: 10560, new size: 27328, new size optimized: 8496 Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip The index has 0 keys with 0 cases Old size: 112, new size: 56, new size optimized: 60 Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip The index has 24 keys with 233 cases Old size: 11616, new size: 16368, new size optimized: 12200 Mapnik (true) - https://github.com/bastik/mapcss-tools/raw/osm/mapnik2mapcss/osm-results/mapnik.zip The index has 0 keys with 0 cases Old size: 112, new size: 56, new size optimized: 60
Lane features is a petty bad example for size comparison:
Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1 The index has 0 keys with 0 cases Old size: 48, new size: 48, new size optimized: 52 Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1 The index has 22 keys with 54 cases Old size: 249488, new size: 4168, new size optimized: 3916 Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1 The index has 22 keys with 54 cases Old size: 249488, new size: 4168, new size optimized: 3916 Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1 The index has 0 keys with 0 cases Old size: 48, new size: 48, new size optimized: 52 Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1 The index has 0 keys with 0 cases Old size: 48, new size: 48, new size optimized: 52 Lane details, right-hand traffic, arrow version - https://josm.openstreetmap.de/josmfile?page=Styles/Lane_features&zip=1 The index has 0 keys with 0 cases Old size: 48, new size: 48, new size optimized: 52
The reason is that the size does increase with the number of key/value pairs or keys used, not with the number of rules. So as long as your rules all use the same keys, your overall size will decrease. For the lane style, only 22 keys were identified. Of those, there were 32 values (and the rest is the default case). So all in all, you only need 54 BitSets for the cache.
All in all, it is a trade of speed for some more memory used. We could keep both implementations (since they almost have the same signature) to use it in such worst-case-scenarios. Or we could just care about this as soon as someone writes a useful worst-case style (e.g. by only splitting simple key value rules as soon as that key is used often and just keeping the key bitset in other cases). We might consider using that better BitSet.
comment:9 by , 10 years ago
A first optimization would be to not access the rules
array directly in RuleCandidatesIterator
, but to prepare a separate list for each index type. This way you are not wasting bit slots that aren't used anyway.
Of course this doesn't really help with the worst case scenario. (Which isn't really worst-case, but something I would expect to happen for normal use.)
One fix could be a "chunked index": The (meta-)index would accept rules until the rule number reaches, say, L=512. Then it will save this as a first chunk in a list. It then starts a new index, saving rule number r at bit position r-L. When this index is full (rule number 2*L is reached), it will start the third chunk and so on.
As far as I can tell, there should be no significant performance loss in time and it solves the explosion in memory use.
PS: It would help to have some code along with command line output as I sometimes cannot tell what you are printing exactly.
PPS: Still, this is brilliant work so far. I would not have guessed, that bitmap is the best solution here. :)
follow-up: 11 comment:10 by , 10 years ago
Hi
The code is basically:
int cases = 0; int sizeUsedNew = 0; int sizeUsedOld = 0; int sizeUsedOpt = 0; for (MapCSSKeyRules v : index.values()) { cases += 1 + v.specialRules.size(); for (BitSet r : v.specialRules.values()) { sizeUsedNew += size(r); BitSet bs = new BitSet(); bs.andNot(v.generalRules); sizeUsedOpt += sizeOpt(bs); sizeUsedOld += sizeAsList(bs); } sizeUsedNew += size(v.generalRules); sizeUsedOpt += sizeOpt(v.generalRules); sizeUsedOld += sizeAsList(v.generalRules); } sizeUsedNew += size(remaining); sizeUsedOld += sizeAsList(remaining); sizeUsedOpt += sizeOpt(remaining); System.out.println( title + " - " +url ); System.out.println("The index has " + index.size() + " keys with " + cases + " cases"); System.out.println("Old size: " + sizeUsedOld + ", new size: " + sizeUsedNew + ", new size optimized: " + sizeUsedOpt); private int size(BitSet bs) { return 24 + 24+ ((bs.length() + 63) / 64) * 8; } private int sizeOpt(BitSet bs) { return 24 + 28 + ((bs.length() + 63) / 64 - bs.nextSetBit(0) / 64) * 8; } private int sizeAsList(BitSet bs) { return 24 + 24 + bs.cardinality()* 64; }
There are multiple ways to address the memory size issue. The rules array is already per index. To address the size issue, we would need to examine the structure of the bitsets in more detail, which is not possible in a general case.
One solution would be to store an array of ints per entry (instead of a bitmap). That way, we only need as many entries as we have active bits. Merging can be done in O(n) by simply using BitSets.
We could also use a more generic approach (e.g. storing only indexes as long as we only have <= 3 values). Most key=value pairs only have few rules (icon definitions, ...). That way, we would not need a bitset for them but still have the good performance for the key=value pairs with many rules.
Your chunked index has the disadvantage, that one has table lookup per chunk is required. Since that is a pretty expensive part of the rule search, I would like to keep the chunks as big as possible. But it is a change that is relatively easy to implement.
The most important thing is to avoid sorting, since this involves a lot of compareTo() calls and pointer movement.
I also tried a k-way merge, but I did not get it as fast as bitmaps are (but at least faster than basic sorting). Have a look at it:
https://github.com/michaelzangl/josm-playground/blob/ea5aec8103e1b28905245cbbcc99646aed4f157f/src/org/openstreetmap/josm/gui/mappaint/mapcss/MapCSSStyleSource.java
My advice would be to accept the possibility of the memory explosion at the moment and keep in mind that that might happen some time and how we can solve it. For current styles, the memory footprint would even decrease in many cases and the additional memory size required is less than 100kiB even when the user has activated several styles.
comment:11 by , 10 years ago
Replying to michael2402:
Hi
The code is basically:
...
Thanks!
The rules array is already per index.
Yes, sorry, I got confused by the inner class.
To address the size issue, we would need to examine the structure of the bitsets in more detail, which is not possible in a general case.
One solution would be to store an array of ints per entry (instead of a bitmap). That way, we only need as many entries as we have active bits. Merging can be done in O(n) by simply using BitSets.
Sounds good.
We could also use a more generic approach (e.g. storing only indexes as long as we only have <= 3 values). Most key=value pairs only have few rules (icon definitions, ...). That way, we would not need a bitset for them but still have the good performance for the key=value pairs with many rules.
I'm not sure I understand this idea. Do you mean different data structure depending on the number of rules for a single key or key=value?
Your chunked index has the disadvantage, that one has table lookup per chunk is required. Since that is a pretty expensive part of the rule search, I would like to keep the chunks as big as possible.
I doubt that a HashMap
lookup is expensive. In fact it should be extremely cheap. Try to put the lookup 10 times in a row, you won't see a measurable difference.
But it is a change that is relatively easy to implement.
With a larger chunk size, this will be practially no change for most of the existing styles. But it fixes the problems with very large style sheets, even if it comes (as you think) with a runtime cost for these particular styles.
I also tried a k-way merge, but I did not get it as fast as bitmaps are (but at least faster than basic sorting). Have a look at it:
https://github.com/michaelzangl/josm-playground/blob/ea5aec8103e1b28905245cbbcc99646aed4f157f/src/org/openstreetmap/josm/gui/mappaint/mapcss/MapCSSStyleSource.java
I would go with the fastest solution that doesn't have memory issues.
My advice would be to accept the possibility of the memory explosion at the moment and keep in mind that that might happen some time and how we can solve it. For current styles, the memory footprint would even decrease in many cases and the additional memory size required is less than 100kiB even when the user has activated several styles.
Nope, this is a regression and should be fixed. People come up with crazy stuff all the time... I'm sure we'll have a style with ~50000 rules pretty soon if it isn't used already in private.
comment:12 by , 10 years ago
Summary: | [Patch] Improve MapCSS speed by using BitSets → Improve MapCSS speed by using BitSets |
---|
by , 9 years ago
Attachment: | mapcss-use-bitmap-and-keycache-size.patch added |
---|
by , 9 years ago
Attachment: | mapcss-use-bitmap-and-keycache-size.2.patch added |
---|
comment:13 by , 9 years ago
Hi,
I took some time to write a patch that does not have the memory issues you discussed. I did not use a segmentation approach but instead used a mixture of bitsets and normal integer lists that has a size limit in O(<number of set bits>). I changed the key/value rules to not contain the key rules. This requires one additional or when merging, but it ensures that every rule only produces one single set bit in any of the rule bitsets.
I did not notice any real performance changes (less than noise), in theory there sould be imrpovements when having many rules (since bitset merging is faster).
comment:14 by , 9 years ago
Looks nice!
RuleBitset.setBitsIn
: I would expect areturn
after the first loop.- Even after adding the return, I cannot measure an improvement coming from
RuleBitset.optimize
. Could you check and possibly simplify the code if it isn't needed? - Why do are you using
long[]
instead ofjava.util.BitSet
? - Please add javadoc for every new method (without
@override
) and class. I know the existing JOSM code is not fully documented, but we are trying to get there. :)
comment:16 by , 9 years ago
Michael, can you please provide a new version taking into account Paul's remarks? :)
follow-up: 18 comment:17 by , 9 years ago
I did not find any performance improvements in using this.
I will do some more performance tests during my GSoC and will then address the issue and evaltuate some alternatives. For now, the BitSet version is not slower than the version I posted in that patch.
The long[] is used, because a BitSet does not allow an offset. It is faster for MapCSS files that have many rules with the same priority for the same key (like [name~=...]). For the default style, there should be no performance gain and we could just use int[]s of rules that are included for this key.
comment:18 by , 9 years ago
Replying to michael2402:
I did not find any performance improvements in using this.
What are you referring to?
I will do some more performance tests during my GSoC and will then address the issue and evaltuate some alternatives. For now, the BitSet version is not slower than the version I posted in that patch.
We don't ask you to find alternatives, the code you submitted is fine for the most part. There are just minor issues, we would like you to fix, so the patch can be applied and this ticket closed:
- My first point about
return
: Please confirm that this is a typo and upload a new version. In case this is not an error and intended like this, please explain why it is correct. - we still need Javadoc
The long[] is used, because a BitSet does not allow an offset.
Okay.
It is faster for MapCSS files that have many rules with the same priority for the same key (like [name~=...]).
Please provide a sample style where it is in fact faster. (Compared to the version where calls to RuleBitset.optimize
are commented out.) Your hybrid solution is fine if there is a gain. Otherwise we could drop a lot of the more technical code and the patch could be simplified significantly. This would be easier to read for fellow developers and (probably) less error prone. :)
comment:19 by , 9 years ago
Summary: | Improve MapCSS speed by using BitSets → [Patch] Improve MapCSS speed by using BitSets |
---|
comment:21 by , 9 years ago
Yes, it is already in r8339 (no complaints so far), so it should be okay to postpone for one more release.
comment:26 by , 9 years ago
Milestone: | 15.08 → 15.09 |
---|
comment:28 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | reopened → needinfo |
What is the status of this ticket/patch?
comment:30 by , 9 years ago
My current problem is that I was not able to accelerate this further.
Currently, the main computing power is used for (using the MapCSSStyleSourceFilterTest#testKeyValueRules):
- Traversing the list of rules (15%)
- Hashtable lookup for key and value (approx. 25% each, total 50%). Can be reduced using IdentityHashTable. We would need to enforce interning of all entries in the OsmPrimitive#keys array for this. Changing this should make this part 5 times faster than it is now.
- The real apply/matches/..-calls (15%)
What I did achieve is lowering the upper memory bound to O(n) (instead O(n*n)) at runtime, but O(n*n) for creating the index (in theory, this can also be lowered).
comment:32 by , 8 years ago
Yes, the patch is in core for about one year now. The only issue remaining is that the theoretical memory bound is O(n*n) now. In practice, you won't notice it since n is relatively small and we are talking about bits here ;-). It could be reduced by using integer lists instead of bit sets.
comment:33 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | needinfo → closed |
have you run some benchmarks to measure the performance gain?