-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cursor mispositioning when using a custom font with unicode text #1813
Comments
Which browser (including version) and OS are you using? With Chrome and Firefox on Linux, everything seems to work as intended. This is the HTML code I'm using:
|
From what we can tell, Chrome and Opera on Windows and Chrome on OSX (FF and IE do not seem affected) |
Unable to reproduce on Ubuntu 13.04 Chrome stable 32 bit |
OS both tested and breaks. OS X Mavericks (10.9) Chrome Version 29.0.1547.65 |
Tested again with Mac OS X 10.8 and Windows 8 on Chrome stable 64 bit and they breaks on that Thai language. |
Still no luck reproducing it. Here's what I've been doing:
|
As a clean from-file immediate load, I can't reliably reproduce it, but as a "start up codemirror, then set new content after initial load", like the original STR, I see it happen virtually all the time. |
Following the original steps also doesn't reproduce it for me. But those steps don't make any sense for me -- there are no style tags in the head of demo/complete.html, and if I add one using dev tools and add those rules to it, the browser doesn't seem to load the font. If you can be bothered to create a test case that reliably reproduces the issue and put it online (maybe jsbin.com), I'll take another look. If not, I'm out of patience. |
So I was going to do what @marijnh was asking and found out that jsbin also have the same issue (and I believe jsbin.com uses codeMirror as well) I have tested with Chrome stable, Canary, Opera, Safari and all broke except Firefox. Below is the video to demonstrate the problem. https://dl-web.dropbox.com/spa/n5v3bx9nnjkdpzf/codemirror.mp4 |
sorry, original STR should have been "html-edit any of the elements in the head, and add a style element". I'm not trying to make you play a guessing game but on windows and OSX this is a guaranteed way to reproduce the effect, tested by multiple people on multiple computers, and it's impacting our effort to localise Mozilla's webmaker.org to things like Thai or other non-latin scripts. Which OS are you testing this on? (because it does not happen on linux, it's probably using a better fallback font with metrics that don't mess up codemirror) |
I finally managed to reproduce this on Windows. Still no luck on OS X, but that might be because I'm still on Mountain Lion (hardware too old for Apple to allow me to upgrade further). The problem is that apparently, some platforms, with some fonts, render an dashed empty circle in front of the 'ำ' character when it is in a span element on its own. That circle is showing up in the hidden element that's used to measure the position of characters, and throwing off the measurements. I haven't been able to come up with a quick workaround. I have some vague plans to overhaul the measuring system, which would address this (along with a bunch of wrapping related bugs and some of the slowness of big lines), but implementing that is a big project, and I don't know when I'll have the time and motivation to work on it yet. How much of a showstopper is this for you? You could consider just switching to courier or some other system font when you detect Thai language. That's a silly hack, but I guess it's better than having incorrect cursor placement. |
oh! that empty circle is a unicode combining mark placeholder... if it's putting that into spans in isolation then this might actually be a javascript split() problem, where a string with combining marks (in Thai that's also things like vowels) is being split up based on individual unicode code points, rather than splitting across "letter" boundaries. I wonder if there's a small JS lib that will do correct unicode splitting for us here (will have a look). In terms of severity it's tricky: we have webmaker fully localisted for Thai and Russian, with Korean on its way, and we haven't released the localised version yet, but they're rearing to go. We're holding off on them until we can somehow fix or work around this issue though, since the user experience is one that might actually drive users away because they can't reliably edit their content. |
CM is already handling combining marks for some languages (for example `COMBINING_ACUTE_ACCENT 769), but those currently have the effect of making the two code points act as a single letter -- you can't put your cursor between the 'e' and the accent in 'é', even when it is written as two code points. Would that behavior also be appropriate in this case? From the look of it, it seems that these are more like two separate characters where the second happens to add a something to the first. |
That sounds pretty much exactly what is necessary here, too: you shouldn't be able to put the cursor between the two parts that make up ดำ, as it's "one thing" in combination (although hitting backspace should still work and turn it into ด). Would this also work for base glyphs with more than one combining mark? Vietname ờ for instance, which github's comment box mangles pretty badly but is "o" + 0x31B (combining horn) + 0x340 (combining grave tone mark) while still being only a single "letter" in terms of cursor positioning. |
So, JavaScript's support for this stuff is pretty much zero. I've been adding regexp ranges to detect combining characters in various scripts as I find that users are having trouble with them. Can you give me a range or set of ranges that correspond to combining characters in the Thai script? Do I see correctly that there are also prefix combining characters? (Where a combining code point following a non-combining one still leads to a composite glyph.) |
I see that aforementioned regexp ranges already include the Combining Diacritical Marks (0300–036F, which by the way includes 0x31B and 0x340) plus a few others. Would it make sense to also match the following ranges?
Source: https://en.wikipedia.org/wiki/Combining_character#Unicode_ranges |
@jankeromnes It sounds like that would make sense. See 66a5cd6 |
This is the same issue as #2115 |
Same issue here |
We were running into some weird behaviour on thimble.webmaker.org when using languages that require unicode with combining diacritics and custom fonts, where the cursor positions and actual text don't line up. STR are thankfully easy:
* { font-family: ubuntu!important; }
All elements on the page should now be using the Ubuntu Mono Regular font.
This is just some nonsense Thai text, but demonstrates the problem immediately: click anywhere in the text, then hit enter, and notice that line gets broken in a completely different place. Placing the cursor at the start of the line and using the right cursor key to walk through it also shows the cursor moving far more than the rendered text suggests it should.
The text was updated successfully, but these errors were encountered: