Modify

Opened 6 years ago

Last modified 5 years ago

#17609 new enhancement

Detect invalid usage of ZWNJ/ZWJ characters

Reported by: Don-vip Owned by: team
Priority: normal Milestone:
Component: Core validator Version:
Keywords: unicode persian arabic Cc:

Description

Follow-up of #17595. We should detect incorrect uses of wikipedia:Zero-width_joiner and wikipedia:Zero-width_non-joiner characters in OSM tags:

> In practice, zwnj is like space but with a width of zero and It does prevent adjacent characters to join each other. Although it's a valid character in Persian, yet there are some cases it appears at invalid position in a word. So It would be nice if we could keep warnings for invalid cases.
>
> As an example think about "aa*aaa" as a word in Persian with zwnj included. asterisk is zwnj.
>
> the only valid case is aa*aaa.
> (between two adjacent letters that can join each other)
>
> more common cases that are invalid:
>
> * doubled zwnj or more (like doubled space): aa**aaa
> * at start or end of word: *aa*aaa or aa*aaa*
> * immediately before/after space character: aa* aaa or aa *aaa (this could happen in a word, because normally we type zwnj with shift+space)
> * maybe a more tricky one:
> * We have seven letters (و, ژ, ز, ر, ذ, د, ا) which do not connect to a following letter. So writing zwnj after them is useless and not needed. assume b is one of them. this is invalid: ab*aaa (this case could happen to other languages with similar but not the same letters)

Attachments (0)

Change History (3)

comment:1 by iman, 6 years ago

oh thanks!

comment:2 by Don-vip, 5 years ago

@iman: can the seven letters connect to a previous letter? With your example, is aa*baa valid?

comment:3 by iman, 5 years ago

yes, they can connect directly or with a zwnj before. aa*baa is valid.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain team.
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from team to the specified user.
Next status will be 'needinfo'. The owner will be changed from team to Don-vip.
as duplicate The resolution will be set to duplicate. Next status will be 'closed'. The specified ticket will be cross-referenced with this ticket.
The owner will be changed from team to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.