Modify

Opened 15 years ago

Closed 13 years ago

#3664 closed defect (othersoftware)

regexp search spits weird errors on some expressions

Reported by: bilbo Owned by: team
Priority: major Milestone:
Component: Core Version: latest
Keywords: javabug Cc: jttt

Description

I searched for expression:

name:[ěš]

(name containing either "ě" or "š")

and I got error

The regex "(?:[]|ě|ě|š|š)" had a parse error at offset 15

I remember these expressions worked few hundred revisions ago, so I guess something in the search got broken

Case sensitive and regular expression checkboxes are ticked.

Attachments (0)

Change History (9)

comment:1 by stoecker, 14 years ago

Cc: jttt added
Keywords: javabug added

Situation is this: We use CANON_EQ and it seems this converts different codes into this:
(?:c1|c2), so ěš gets (?:ě|ě|š|š). Now the problem is, that the now empty [] is not removed correctly. This is clearly a bug in Java parsing code.

name:"(?:ě|š)" and name:"(?|[ěš]) fails for the same reason.

You can work around this only by adding something inside the [], so it wont get empty. Or you use name:"[(?|(ěš)]" (i.e. add the (?|...) inside the [] instead of outside as the CANON_EQ does.

@Jiri:
Any comments to this topic? Should we report is as a bug to Java?

comment:2 by anonymous, 14 years ago

But if it is java bug, how is it possible that it worked before?

Is CANON_EQ needed? If it was not used, can we either remove it or make it optional in preferences?

comment:3 by stoecker, 14 years ago

We did not use CANON_EQ before :-)

CANON_EQ ensures, that different forms of UTF-8 representation are all found when searching. This is an important feature.

I suggest to use the above workaround (or add more characters inside the [], which have only a single UTF-8, so the brackets don't get empty).

comment:4 by stoecker, 14 years ago

Resolution: othersoftware
Status: newclosed

I tried to enter a bug report for Java, but it seems this did not succeed. As I have no time to care for bugs where reporting is ignored, I'll close this. Nothing we can do about this. Use given workaround until they fix Java.

comment:5 by bastiK, 14 years ago

Resolution: othersoftware
Status: closedreopened

Do we need CANON_EQ flag, if both pattern and string are normalized with java.text.Normalizer?

comment:6 by stoecker, 14 years ago

You can try with the examples in this report to find out.

comment:7 by bastiK, 14 years ago

How? I cannot find different representations of equivalent unicode string in this report.

comment:8 by stoecker, 14 years ago

Hmm, seems copy and paste removed the differences. Cou can try [(?|(ěš)] with old josm and CANON_EQ and get the text from the error message :-)

comment:9 by stoecker, 13 years ago

Resolution: othersoftware
Status: reopenedclosed

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.