Opened 16 years ago
Closed 14 years ago
#3664 closed defect (othersoftware)
regexp search spits weird errors on some expressions
Reported by: | bilbo | Owned by: | team |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Core | Version: | latest |
Keywords: | javabug | Cc: | jttt |
Description
I searched for expression:
name:[ěš]
(name containing either "ě" or "š")
and I got error
The regex "(?:[]|ě|ě|š|š)" had a parse error at offset 15
I remember these expressions worked few hundred revisions ago, so I guess something in the search got broken
Case sensitive and regular expression checkboxes are ticked.
Attachments (0)
Change History (9)
comment:1 by , 15 years ago
Cc: | added |
---|---|
Keywords: | javabug added |
comment:2 by , 15 years ago
But if it is java bug, how is it possible that it worked before?
Is CANON_EQ needed? If it was not used, can we either remove it or make it optional in preferences?
comment:3 by , 15 years ago
We did not use CANON_EQ before :-)
CANON_EQ ensures, that different forms of UTF-8 representation are all found when searching. This is an important feature.
I suggest to use the above workaround (or add more characters inside the [], which have only a single UTF-8, so the brackets don't get empty).
comment:4 by , 15 years ago
Resolution: | → othersoftware |
---|---|
Status: | new → closed |
I tried to enter a bug report for Java, but it seems this did not succeed. As I have no time to care for bugs where reporting is ignored, I'll close this. Nothing we can do about this. Use given workaround until they fix Java.
comment:5 by , 15 years ago
Resolution: | othersoftware |
---|---|
Status: | closed → reopened |
Do we need CANON_EQ flag, if both pattern and string are normalized with java.text.Normalizer?
comment:7 by , 15 years ago
How? I cannot find different representations of equivalent unicode string in this report.
comment:8 by , 15 years ago
Hmm, seems copy and paste removed the differences. Cou can try [(?|(ěš)] with old josm and CANON_EQ and get the text from the error message :-)
comment:9 by , 14 years ago
Resolution: | → othersoftware |
---|---|
Status: | reopened → closed |
Situation is this: We use CANON_EQ and it seems this converts different codes into this:
(?:c1|c2), so ěš gets (?:ě|ě|š|š). Now the problem is, that the now empty [] is not removed correctly. This is clearly a bug in Java parsing code.
name:"(?:ě|š)" and name:"(?|[ěš]) fails for the same reason.
You can work around this only by adding something inside the [], so it wont get empty. Or you use name:"[(?|(ěš)]" (i.e. add the (?|...) inside the [] instead of outside as the CANON_EQ does.
@Jiri:
Any comments to this topic? Should we report is as a bug to Java?