Ticket #3664 (closed defect: othersoftware)
regexp search spits weird errors on some expressions
| Reported by: | singularita@… | Owned by: | team |
|---|---|---|---|
| Priority: | major | Component: | Core |
| Version: | latest | Keywords: | javabug |
| Cc: | jttt |
Description
I searched for expression:
name:[ěš]
(name containing either "ě" or "š")
and I got error
The regex "(?:[]|ě|ě|š|š)" had a parse error at offset 15
I remember these expressions worked few hundred revisions ago, so I guess something in the search got broken
Case sensitive and regular expression checkboxes are ticked.
Attachments
Change History
comment:2 Changed 20 months ago by anonymous
But if it is java bug, how is it possible that it worked before?
Is CANON_EQ needed? If it was not used, can we either remove it or make it optional in preferences?
comment:3 Changed 20 months ago by stoecker
We did not use CANON_EQ before :-)
CANON_EQ ensures, that different forms of UTF-8 representation are all found when searching. This is an important feature.
I suggest to use the above workaround (or add more characters inside the [], which have only a single UTF-8, so the brackets don't get empty).
comment:4 Changed 17 months ago by stoecker
- Status changed from new to closed
- Resolution set to othersoftware
I tried to enter a bug report for Java, but it seems this did not succeed. As I have no time to care for bugs where reporting is ignored, I'll close this. Nothing we can do about this. Use given workaround until they fix Java.
comment:5 Changed 17 months ago by bastiK
- Status changed from closed to reopened
- Resolution othersoftware deleted
Do we need CANON_EQ flag, if both pattern and string are normalized with java.text.Normalizer?
comment:6 Changed 17 months ago by stoecker
You can try with the examples in this report to find out.
comment:7 Changed 17 months ago by bastiK
How? I cannot find different representations of equivalent unicode string in this report.



Situation is this: We use CANON_EQ and it seems this converts different codes into this: (?:c1|c2), so ěš gets (?:ě|ě|š|š). Now the problem is, that the now empty [] is not removed correctly. This is clearly a bug in Java parsing code.
name:"(?:ě|š)" and name:"(?|[ěš]) fails for the same reason.
You can work around this only by adding something inside the [], so it wont get empty. Or you use name:"[(?|(ěš)]" (i.e. add the (?|...) inside the [] instead of outside as the CANON_EQ does.
@Jiri: Any comments to this topic? Should we report is as a bug to Java?