#12549 closed defect (fixed)
Regex fails with accented letters
Reported by: | naoliv | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | 16.02 |
Component: | Core validator | Version: | |
Keywords: | mapcss | Cc: |
Description
Using this example MapCSS rule:
*[name =~ /^[a-z ]+$/] { throwWarning: tr("name is all lowercase"); }
It should match both nodes from the attached example file, but it matches only the first one, with name=aa
(failing in name=aá
)
JOSM:
Build-Date:2016-02-21 00:01:27 Revision:9844 Is-Local-Build:true Identification: JOSM/1.5 (9844 SVN pt_BR) Linux Debian GNU/Linux unstable (sid) Memory Usage: 577 MB / 3641 MB (196 MB allocated, but free) Java version: 1.8.0_72-internal-b15, Oracle Corporation, OpenJDK 64-Bit Server VM VM arguments: [-Dawt.useSystemAAFontSettings=on] Dataset consistency test: No problems found Plugins: - AddrInterpolation (31772) - Create_grid_of_ways (31772) - FastDraw (31895) - FixAddresses (31772) - OpeningHoursEditor (31772) - PicLayer (31895) - SimplifyArea (31895) - apache-commons (31895) - buildings_tools (31895) - download_along (31772) - editgpx (31772) - ejml (31895) - geotools (31895) - graphview (31895) - jts (31772) - kendzi3d (1.0.189) - kendzi3d-jogl (41) - kendzi3d-resources (0.0.1) - log4j (31895) - measurement (31895) - merge-overlap (31967) - opendata (32071) - pdfimport (32019) - photo_geotagging (31895) - poly (31772) - reverter (32005) - tagging-preset-tester (31895) - todo (29154) - turnrestrictions (31895) - undelete (31895) - utilsplugin2 (32018)
Attachments (1)
Change History (10)
Changed 8 years ago by
Attachment: | example.osm added |
---|
comment:1 Changed 8 years ago by
comment:2 Changed 8 years ago by
Didn't know. Thanks for pointing this.
And how should I use it?
With *[name =~ /^[\p{javaLowerCase} ]+$/]
there is a parsing exception.
With *[name =~ /^[\\p{javaLowerCase} ]+$/]
it doesn't match anything.
comment:3 Changed 8 years ago by
My regex foo suggests something like:
*[name =~ /^(\p{javaLowerCase}| )+$/]
You can't use \p{...}
character classes within []
AFAIK
comment:5 follow-up: 6 Changed 8 years ago by
It seems that it also doesn't work with (?i)
:
For example:
*[name =~ /^(?i)fóo$/]
won't match name=fÓo
comment:6 Changed 8 years ago by
Replying to naoliv:
*[name =~ /^(?i)fóo$/]
won't matchname=fÓo
You need to use [name =~ /^(?i)(?u)fóo$/]
, see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CASE
comment:8 Changed 8 years ago by
Milestone: | → 16.02 |
---|
Replying to naoliv:
It fails with parsing error too.
Fixed in r9857. Use [name =~ /^(?U)(\p{Lower})+$/]
, (?U)
is compulsory, see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS
As far as I checked, [a-z] in Java doesn't cover national characters, nor is doing the same \w character class (though last sentence in POSIX bracket expressions suggests that it should).
The class that covers all lowercase letters in Java is:
\p{javaLowerCase}
(See Pattern javadoc)