Context Navigation

Modify ↓

#12549 closed defect (fixed)

Regex fails with accented letters

Reported by:	naoliv	Owned by:	team
Priority:	normal	Milestone:	16.02
Component:	Core validator	Version:
Keywords:	mapcss	Cc:

Description

Using this example MapCSS rule:

*[name =~ /^[a-z ]+$/] {
        throwWarning: tr("name is all lowercase");
}

It should match both nodes from the attached example file, but it matches only the first one, with name=aa (failing in name=aá)

JOSM:

Build-Date:2016-02-21 00:01:27
Revision:9844
Is-Local-Build:true

Identification: JOSM/1.5 (9844 SVN pt_BR) Linux Debian GNU/Linux unstable (sid)
Memory Usage: 577 MB / 3641 MB (196 MB allocated, but free)
Java version: 1.8.0_72-internal-b15, Oracle Corporation, OpenJDK 64-Bit Server VM
VM arguments: [-Dawt.useSystemAAFontSettings=on]
Dataset consistency test: No problems found

Plugins:
- AddrInterpolation (31772)
- Create_grid_of_ways (31772)
- FastDraw (31895)
- FixAddresses (31772)
- OpeningHoursEditor (31772)
- PicLayer (31895)
- SimplifyArea (31895)
- apache-commons (31895)
- buildings_tools (31895)
- download_along (31772)
- editgpx (31772)
- ejml (31895)
- geotools (31895)
- graphview (31895)
- jts (31772)
- kendzi3d (1.0.189)
- kendzi3d-jogl (41)
- kendzi3d-resources (0.0.1)
- log4j (31895)
- measurement (31895)
- merge-overlap (31967)
- opendata (32071)
- pdfimport (32019)
- photo_geotagging (31895)
- poly (31772)
- reverter (32005)
- tagging-preset-tester (31895)
- todo (29154)
- turnrestrictions (31895)
- undelete (31895)
- utilsplugin2 (32018)

Attachments (1)

example.osm (365 bytes ) - added by naoliv 9 years ago.

Download all attachments as: .zip

Change History (10)

by naoliv, 9 years ago

Attachment:	example.osm added

comment:1 by wiktorn, 9 years ago

As far as I checked, [a-z] in Java doesn't cover national characters, nor is doing the same \w character class (though last sentence in POSIX bracket expressions suggests that it should).

The class that covers all lowercase letters in Java is:
\p{javaLowerCase}

(See Pattern javadoc)

comment:2 by naoliv, 9 years ago

Didn't know. Thanks for pointing this.

And how should I use it?

With *[name =~ /^[\p{javaLowerCase} ]+$/] there is a parsing exception.
With *[name =~ /^[\\p{javaLowerCase} ]+$/] it doesn't match anything.

comment:3 by wiktorn, 9 years ago

My regex foo suggests something like:
*[name =~ /^(\p{javaLowerCase}| )+$/]

You can't use \p{...} character classes within [] AFAIK

follow-up: 8 comment:4 by naoliv, 9 years ago

It fails with parsing error too.

follow-up: 6 comment:5 by naoliv, 9 years ago

It seems that it also doesn't work with (?i):
For example:

*[name =~ /^(?i)fóo$/] won't match name=fÓo

in reply to: 5 comment:6 by simon04, 9 years ago

Replying to naoliv:

*[name =~ /^(?i)fóo$/] won't match name=fÓo

You need to use [name =~ /^(?i)(?u)fóo$/], see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CASE

comment:7 by simon04, 9 years ago

Resolution:	→ fixed
Status:	new → closed

In 9857/josm:

fix #12549 - MapCSS: permit using character classes in regexp: \p{...}

in reply to: 4 comment:8 by simon04, 9 years ago

Milestone:	→ 16.02

Replying to naoliv:

It fails with parsing error too.

Fixed in r9857. Use [name =~ /^(?U)(\p{Lower})+$/], (?U) is compulsory, see https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS

comment:9 by naoliv, 9 years ago

Right. Thank you!

Modify Ticket

Change Properties

Summary:
Type:		Priority:
Milestone:		Component:
Version:		Keywords:
Cc:	Set your email in Preferences

Action

leave as closed The owner will remain team.

change resolution as The resolution will be set.

reopen The resolution will be deleted. Next status will be 'reopened'.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: