Opened 12 years ago
Closed 12 years ago
#8354 closed defect (duplicate)
Supplementary characters out of the Unicode BMP
Reported by: | verdy_p | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core | Version: | |
Keywords: | Cc: |
Description
I have found some intriguating data on the node for Paris, France : it contains a name:got entry for the name written in the Gottic script and language.
However this script can only be written with characters out of the Unicode BMP, in fact they are all in the SMP (1st supplementary multilingual plane).
JOSM then does not validate this data and parses it as an overlong string despite it is short (𐍀𐌰𐍂𐌹𐍃 according to the name of the Wikipedia article).
Well JOSM cannot display this name due to the current lack of support for rendering this script.
But its vlidator shoul not say that the key value is overlong (?).
In fact I cannot even copy-paste the name to replace it and fix the issue reported (JOSM does not want any presence of surrogates (in D800..DFFF), even if they are correctly paired in the data to paste
OSM node id: 17807753
Attachments (0)
Change History (5)
comment:1 by , 12 years ago
comment:2 by , 12 years ago
Reporter: | changed from | to
---|
comment:4 by , 12 years ago
Yes effectively, because I found this on a node name in Paris where validation tools detected overlong strings containing garbage, created or modified by JOSM not understanding correctly the Gottic name.
Note that JOSM also complains incorrectly about the language code "got" in name:got=*
(or in wikipedia=got:*
and wikipedia:got=*
), which is perfectly valid.
comment:5 by , 12 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Closed as duplicate of #3290.
Note : For now I consider deleting the Gottic name as it causes trouvles and is not needed. But the case will be more significative in China, where toponyms DO require the use of supplementary characters in the Supplementary Ideographic plane, for effective modern use !
JOSM should then support the input of data containing valid surrogate pairs (in Java strings are encoded internally as UTF-16, even if the OSM protocols and XML file formats use UTF-8 externally, so that they do not contain any "surrogate"). It should supprt them as long as the surrogates pairs are valid :