Make sure an XML parser fully supporting UTF-16 is used by JOSM
|Reported by:||Gubaer||Owned by:||team|
Description (last modified by verdy_p)
See the discussion on dev and josm-dev:
- there's a problem with XML parsers which don't handle UTF-16 correctly. Apparently, they insert duplicates of surrogate code points in OSM keys or values while parsing. After a couple of IO operations even small OSM files/fragments can become very large. In OSM the problem was spotted because of gothic code points in name:got-tags.
- Xerces-J 2.6.2 seems to be affected
- Xerces-J 2.9.1 seems to be OK
- Woodstox StAX XML parser seems to be OK
JOSM should either ship a compliant parser with it's distribution or check/enforce on startup that a known compliant parser is on the classpath.