Opened 4 years ago

Last modified 4 years ago

#17609 new enhancement

Detect invalid usage of ZWNJ/ZWJ characters

Reported by: Don-vip Owned by: team
Priority: normal Milestone:
Component: Core validator Version:
Keywords: unicode persian arabic Cc:


Follow-up of #17595. We should detect incorrect uses of wikipedia:Zero-width_joiner and wikipedia:Zero-width_non-joiner characters in OSM tags:

> In practice, zwnj is like space but with a width of zero and It does prevent adjacent characters to join each other. Although it's a valid character in Persian, yet there are some cases it appears at invalid position in a word. So It would be nice if we could keep warnings for invalid cases.
> As an example think about "aa*aaa" as a word in Persian with zwnj included. asterisk is zwnj.
> the only valid case is aa*aaa.
> (between two adjacent letters that can join each other)
> more common cases that are invalid:
> * doubled zwnj or more (like doubled space): aa**aaa
> * at start or end of word: *aa*aaa or aa*aaa*
> * immediately before/after space character: aa* aaa or aa *aaa (this could happen in a word, because normally we type zwnj with shift+space)
> * maybe a more tricky one:
> * We have seven letters (و, ژ, ز, ر, ذ, د, ا) which do not connect to a following letter. So writing zwnj after them is useless and not needed. assume b is one of them. this is invalid: ab*aaa (this case could happen to other languages with similar but not the same letters)

Attachments (0)

Change History (3)

comment:1 Changed 4 years ago by iman

oh thanks!

comment:2 Changed 4 years ago by Don-vip

@iman: can the seven letters connect to a previous letter? With your example, is aa*baa valid?

comment:3 Changed 4 years ago by iman

yes, they can connect directly or with a zwnj before. aa*baa is valid.

Modify Ticket

Change Properties
Set your email in Preferences
as new The owner will remain team.
as The resolution will be set.
to The owner will be changed from team to the specified user.
The owner will change to Don-vip
as duplicate The resolution will be set to duplicate.The specified ticket will be cross-referenced with this ticket
The owner will be changed from team to anonymous.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.