Modify

Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#12549 closed defect (fixed)

Regex fails with accented letters

Reported by: naoliv Owned by: team
Priority: normal Milestone: 16.02
Component: Core validator Version:
Keywords: mapcss Cc:

Description

Using this example MapCSS rule:

*[name =~ /^[a-z ]+$/] {
        throwWarning: tr("name is all lowercase");
}

It should match both nodes from the attached example file, but it matches only the first one, with name=aa (failing in name=aá)

JOSM:

Build-Date:2016-02-21 00:01:27
Revision:9844
Is-Local-Build:true

Identification: JOSM/1.5 (9844 SVN pt_BR) Linux Debian GNU/Linux unstable (sid)
Memory Usage: 577 MB / 3641 MB (196 MB allocated, but free)
Java version: 1.8.0_72-internal-b15, Oracle Corporation, OpenJDK 64-Bit Server VM
VM arguments: [-Dawt.useSystemAAFontSettings=on]
Dataset consistency test: No problems found

Plugins:
- AddrInterpolation (31772)
- Create_grid_of_ways (31772)
- FastDraw (31895)
- FixAddresses (31772)
- OpeningHoursEditor (31772)
- PicLayer (31895)
- SimplifyArea (31895)
- apache-commons (31895)
- buildings_tools (31895)
- download_along (31772)
- editgpx (31772)
- ejml (31895)
- geotools (31895)
- graphview (31895)
- jts (31772)
- kendzi3d (1.0.189)
- kendzi3d-jogl (41)
- kendzi3d-resources (0.0.1)
- log4j (31895)
- measurement (31895)
- merge-overlap (31967)
- opendata (32071)
- pdfimport (32019)
- photo_geotagging (31895)
- poly (31772)
- reverter (32005)
- tagging-preset-tester (31895)
- todo (29154)
- turnrestrictions (31895)
- undelete (31895)
- utilsplugin2 (32018)

Attachments (1)

example.osm (365 bytes) - added by naoliv 8 years ago.

Download all attachments as: .zip

Change History (10)

Changed 8 years ago by naoliv

Attachment: example.osm added

comment:1 Changed 8 years ago by wiktorn

As far as I checked, [a-z] in Java doesn't cover national characters, nor is doing the same \w character class (though last sentence in POSIX bracket expressions suggests that it should).

The class that covers all lowercase letters in Java is:
\p{javaLowerCase}

(See Pattern javadoc)

comment:2 Changed 8 years ago by naoliv

Didn't know. Thanks for pointing this.

And how should I use it?

With *[name =~ /^[\p{javaLowerCase} ]+$/] there is a parsing exception.
With *[name =~ /^[\\p{javaLowerCase} ]+$/] it doesn't match anything.

comment:3 Changed 8 years ago by wiktorn

My regex foo suggests something like:
*[name =~ /^(\p{javaLowerCase}| )+$/]

You can't use \p{...} character classes within [] AFAIK

comment:4 Changed 8 years ago by naoliv

It fails with parsing error too.

comment:5 Changed 8 years ago by naoliv

It seems that it also doesn't work with (?i):
For example:

*[name =~ /^(?i)fóo$/] won't match name=fÓo

comment:6 in reply to:  5 Changed 8 years ago by simon04

Replying to naoliv:

*[name =~ /^(?i)fóo$/] won't match name=fÓo

You need to use [name =~ /^(?i)(?u)fóo$/], see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CASE

comment:7 Changed 8 years ago by simon04

Resolution: fixed
Status: newclosed

In 9857/josm:

fix #12549 - MapCSS: permit using character classes in regexp: \p{...}

comment:8 in reply to:  4 Changed 8 years ago by simon04

Milestone: 16.02

Replying to naoliv:

It fails with parsing error too.

Fixed in r9857. Use [name =~ /^(?U)(\p{Lower})+$/], (?U) is compulsory, see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS

comment:9 Changed 8 years ago by naoliv

Right. Thank you!

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.