-
Notifications
You must be signed in to change notification settings - Fork 16
Support embedded directions #21
Comments
Hey there, coming from reddit :) Some suggestions for an algorithm to solve this issue:
Does this sound reasonable? As I cannot think of any sane way to detect that "私 - is a japanese letter" "should" be LTR, the user has decide by himself what to do with BIDI text. |
@boast It sounds reasonable yes. I didn't check how other implemetation deals with it. Any PR :-)? |
As for reference implementations: https://github.com/waiting-for-dev/string-direction Or http://en.wikipedia.org/wiki/Bi-directional_text on that topic (notice the table with the classifications). I'll work on it tonight 👍 However, probably need to refactor some methods into helper protected methods to do the checks more granulated. |
@boast Thank you! :-) |
I tried my best to adapt the coding style. No tests broken (or lets say: some tests failed on my Ubuntu Dev Machine before I changed anything, seems like those collator and normalizer tests (especially when they are not available) are broken?) and added a new one following more or less the spec described above. |
ping? |
Hey there, thank you for the ping. I was occupied this half year with doing my bachelor degree in CS. ;) We should define our definitive approach for this problem together and then I / we can work out the implementation. My knowledge about the problem comes specifically from these sources: IMHO, we should first decide on the actual "goal" and "usecase" of this method. Why and when is the information "which direction is this text going" needed? Because one can go crazy on the "strong", "weak" and "normal" characters and contexts... |
So far, we use
|
PS: How your bachelor goes 😉? |
Another use case:
|
Hey Ivan, thanks for asking - my bachelor is done now, so I think, I will find some time to contribute. I will try to implement the algorithm according to the UNICODE BIDIRECTIONAL ALGORITHM. Especially the table Bidirectional Character Types looks very interesting and exactly what is lacking as of now ("weak" characters as numbers and punctuation are not handled correctly by our algorithm). |
Excellent news! |
Just a short update: I wrote a small script which parses the official bidi-classes from the unicode consortium (http://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedBidiClass.txt). It generates an optimized regex (not working atm, I miss something XD). The regex get quiet large though, but may some more optimizations are possible. The script is a small console app (Bin-folder) which allows easy regeneration if the spec should change. After my regex works, I will implement the unicode bidi algorithm from http://www.unicode.org/reports/tr9/. |
Why do we need such a regular expressions? |
We need to distinguish between the different types of bidirectional On Wed, 14 Oct 2015 13:39 Ivan Enderlin [email protected] wrote:
|
Ok :-). |
A string can contain both left-to-right and right-to-left text. We need a better algorithm to guess the current direction of a text :-).
The text was updated successfully, but these errors were encountered: