Modify

Ticket #3664 (closed defect: othersoftware)

Opened 2 years ago

Last modified 5 months ago

regexp search spits weird errors on some expressions

Reported by: singularita@… Owned by: team
Priority: major Component: Core
Version: latest Keywords: javabug
Cc: jttt

Description

I searched for expression:

name:[ěš]

(name containing either "ě" or "š")

and I got error

The regex "(?:[]|ě|ě|š|š)" had a parse error at offset 15

I remember these expressions worked few hundred revisions ago, so I guess something in the search got broken

Case sensitive and regular expression checkboxes are ticked.

Attachments

Change History

comment:1 Changed 20 months ago by stoecker

  • Cc jttt added
  • Keywords javabug added

Situation is this: We use CANON_EQ and it seems this converts different codes into this: (?:c1|c2), so ěš gets (?:ě|ě|š|š). Now the problem is, that the now empty [] is not removed correctly. This is clearly a bug in Java parsing code.

name:"(?:ě|š)" and name:"(?|[ěš]) fails for the same reason.

You can work around this only by adding something inside the [], so it wont get empty. Or you use name:"[(?|(ěš)]" (i.e. add the (?|...) inside the [] instead of outside as the CANON_EQ does.

@Jiri: Any comments to this topic? Should we report is as a bug to Java?

comment:2 Changed 20 months ago by anonymous

But if it is java bug, how is it possible that it worked before?

Is CANON_EQ needed? If it was not used, can we either remove it or make it optional in preferences?

comment:3 Changed 20 months ago by stoecker

We did not use CANON_EQ before :-)

CANON_EQ ensures, that different forms of UTF-8 representation are all found when searching. This is an important feature.

I suggest to use the above workaround (or add more characters inside the [], which have only a single UTF-8, so the brackets don't get empty).

comment:4 Changed 17 months ago by stoecker

  • Status changed from new to closed
  • Resolution set to othersoftware

I tried to enter a bug report for Java, but it seems this did not succeed. As I have no time to care for bugs where reporting is ignored, I'll close this. Nothing we can do about this. Use given workaround until they fix Java.

comment:5 Changed 17 months ago by bastiK

  • Status changed from closed to reopened
  • Resolution othersoftware deleted

Do we need CANON_EQ flag, if both pattern and string are normalized with java.text.Normalizer?

comment:6 Changed 17 months ago by stoecker

You can try with the examples in this report to find out.

comment:7 Changed 17 months ago by bastiK

How? I cannot find different representations of equivalent unicode string in this report.

comment:8 Changed 17 months ago by stoecker

Hmm, seems copy and paste removed the differences. Cou can try [(?|(ěš)] with old josm and CANON_EQ and get the text from the error message :-)

comment:9 Changed 5 months ago by stoecker

  • Status changed from reopened to closed
  • Resolution set to othersoftware
View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.