﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc
3290	Make sure an XML parser fully supporting UTF-16 is used by JOSM	Gubaer	team	"See the discussion on dev and josm-dev:

 * there's a problem with XML parsers which don't handle [http://en.wikipedia.org/wiki/UTF-16 UTF-16] correctly. Apparently, they insert duplicates of ''surrogate code points'' in OSM keys or values while parsing. After a couple of IO operations even small OSM files/fragments can become very large. In OSM the problem was spotted because of [http://www.alanwood.net/unicode/gothic.html gothic] code points in {{{name:got}}}-tags. 

 * Xerces-J 2.6.2 seems to be affected
 * Xerces-J 2.9.1 seems to be OK
 * [http://woodstox.codehaus.org/ Woodstox StAX XML parser] seems to be OK 

JOSM should either ship a compliant parser with it's distribution or check/enforce on startup that a known compliant parser is on the classpath.

"	defect	closed	major		Core		othersoftware	javabug 9 xml unicode stax	
