Cursor mispositioning when using a custom font with unicode text #1813

Pomax · 2013-09-10T15:54:29Z

We were running into some weird behaviour on thimble.webmaker.org when using languages that require unicode with combining diacritics and custom fonts, where the cursor positions and actual text don't line up. STR are thankfully easy:

load up http://codemirror.net/demo/complete.html
open the web console, "edit as html" one of the style elements in the head and add this to the end of it:

<style>
@font-face {
  font-family: ubuntu;
  src: url('https://thimble.webmaker.org/friendlycode/css/ubuntumono/ubuntumono-r.woff');
}
</style>

add a new CSS rule * { font-family: ubuntu!important; }

All elements on the page should now be using the Ubuntu Mono Regular font.

clear the codemirror area and put in this text instead:

ดำดำดำดำดำดำดำดำดำ
กำกำกำกำกำกำกำกำกำ

This is just some nonsense Thai text, but demonstrates the problem immediately: click anywhere in the text, then hit enter, and notice that line gets broken in a completely different place. Placing the cursor at the start of the line and using the right cursor key to walk through it also shows the cursor moving far more than the rendered text suggests it should.

The text was updated successfully, but these errors were encountered:

marijnh · 2013-09-11T13:56:41Z

Which browser (including version) and OS are you using? With Chrome and Firefox on Linux, everything seems to work as intended.

This is the HTML code I'm using:

<!doctype html>
<meta charset="utf-8"/>
<link rel="stylesheet" href="lib/codemirror.css">
<script src="lib/codemirror.js"></script>
<style type="text/css">
 @font-face {
  font-family: ubuntu;
  src: url('https://thimble.webmaker.org/friendlycode/css/ubuntumono/ubuntumono-r.woff');
 }
 .CodeMirror { font-family: ubuntu; }
</style>

<textarea id="code" name="code">ดำดำดำดำดำดำดำดำดำ
กำกำกำกำกำกำกำกำกำ
</textarea>

<script>var editor = CodeMirror.fromTextArea(document.getElementById("code"));</script>

Pomax · 2013-09-11T15:26:47Z

From what we can tell, Chrome and Opera on Windows and Chrome on OSX (FF and IE do not seem affected)

peterkroon · 2013-09-16T10:53:03Z

Unable to reproduce on Ubuntu 13.04 Chrome stable 32 bit

alicoding · 2013-09-16T14:42:55Z

OS both tested and breaks.

OS X Mavericks (10.9)
Mountain Lion (10.8)

Chrome Version 29.0.1547.65
Also Safari, Opera as well.

alicoding · 2013-09-22T12:57:54Z

Tested again with Mac OS X 10.8 and Windows 8 on Chrome stable 64 bit and they breaks on that Thai language.

marijnh · 2013-09-23T10:53:17Z

Still no luck reproducing it. Here's what I've been doing:

I put the HTML test I gave above into a file
Open that file in Chrome 29 (in all of OS X, Windows, and Linux)
Click the top line of Thai characters somewhere near the middle
Cursor appear where I clicked
Press enter
Line is broken where the cursor was showing

Pomax · 2013-09-23T16:00:03Z

As a clean from-file immediate load, I can't reliably reproduce it, but as a "start up codemirror, then set new content after initial load", like the original STR, I see it happen virtually all the time.

marijnh · 2013-09-24T10:07:10Z

Following the original steps also doesn't reproduce it for me. But those steps don't make any sense for me -- there are no style tags in the head of demo/complete.html, and if I add one using dev tools and add those rules to it, the browser doesn't seem to load the font.

If you can be bothered to create a test case that reliably reproduces the issue and put it online (maybe jsbin.com), I'll take another look. If not, I'm out of patience.

alicoding · 2013-09-24T13:12:21Z

So I was going to do what @marijnh was asking and found out that jsbin also have the same issue (and I believe jsbin.com uses codeMirror as well)

I have tested with Chrome stable, Canary, Opera, Safari and all broke except Firefox.

Below is the video to demonstrate the problem.

https://dl-web.dropbox.com/spa/n5v3bx9nnjkdpzf/codemirror.mp4

Pomax · 2013-09-24T14:23:39Z

sorry, original STR should have been "html-edit any of the elements in the head, and add a style element". I'm not trying to make you play a guessing game but on windows and OSX this is a guaranteed way to reproduce the effect, tested by multiple people on multiple computers, and it's impacting our effort to localise Mozilla's webmaker.org to things like Thai or other non-latin scripts. Which OS are you testing this on? (because it does not happen on linux, it's probably using a better fallback font with metrics that don't mess up codemirror)

marijnh · 2013-09-24T14:52:42Z

I finally managed to reproduce this on Windows. Still no luck on OS X, but that might be because I'm still on Mountain Lion (hardware too old for Apple to allow me to upgrade further).

The problem is that apparently, some platforms, with some fonts, render an dashed empty circle in front of the 'ำ' character when it is in a span element on its own. That circle is showing up in the hidden element that's used to measure the position of characters, and throwing off the measurements.

I haven't been able to come up with a quick workaround. I have some vague plans to overhaul the measuring system, which would address this (along with a bunch of wrapping related bugs and some of the slowness of big lines), but implementing that is a big project, and I don't know when I'll have the time and motivation to work on it yet.

How much of a showstopper is this for you? You could consider just switching to courier or some other system font when you detect Thai language. That's a silly hack, but I guess it's better than having incorrect cursor placement.

Pomax · 2013-09-24T14:59:19Z

oh! that empty circle is a unicode combining mark placeholder... if it's putting that into spans in isolation then this might actually be a javascript split() problem, where a string with combining marks (in Thai that's also things like vowels) is being split up based on individual unicode code points, rather than splitting across "letter" boundaries. I wonder if there's a small JS lib that will do correct unicode splitting for us here (will have a look).

In terms of severity it's tricky: we have webmaker fully localisted for Thai and Russian, with Korean on its way, and we haven't released the localised version yet, but they're rearing to go. We're holding off on them until we can somehow fix or work around this issue though, since the user experience is one that might actually drive users away because they can't reliably edit their content.

marijnh · 2013-09-24T15:10:09Z

CM is already handling combining marks for some languages (for example `COMBINING_ACUTE_ACCENT 769), but those currently have the effect of making the two code points act as a single letter -- you can't put your cursor between the 'e' and the accent in 'é', even when it is written as two code points.

Would that behavior also be appropriate in this case? From the look of it, it seems that these are more like two separate characters where the second happens to add a something to the first.

Pomax · 2013-09-24T15:32:22Z

That sounds pretty much exactly what is necessary here, too: you shouldn't be able to put the cursor between the two parts that make up ดำ, as it's "one thing" in combination (although hitting backspace should still work and turn it into ด). Would this also work for base glyphs with more than one combining mark? Vietname ờ for instance, which github's comment box mangles pretty badly but is "o" + 0x31B (combining horn) + 0x340 (combining grave tone mark) while still being only a single "letter" in terms of cursor positioning.

marijnh · 2013-09-24T15:52:42Z

So, JavaScript's support for this stuff is pretty much zero. I've been adding regexp ranges to detect combining characters in various scripts as I find that users are having trouble with them. Can you give me a range or set of ranges that correspond to combining characters in the Thai script? Do I see correctly that there are also prefix combining characters? (Where a combining code point following a non-combining one still leads to a composite glyph.)

jankeromnes · 2013-11-07T04:21:28Z

I see that aforementioned regexp ranges already include the Combining Diacritical Marks (0300–036F, which by the way includes 0x31B and 0x340) plus a few others. Would it make sense to also match the following ranges?

Combining Diacritical Marks Supplement (1DC0–1DFF)
Combining Diacritical Marks for Symbols (20D0–20FF)
Combining Half Marks (FE20–FE2F)

Source: https://en.wikipedia.org/wiki/Combining_character#Unicode_ranges

marijnh · 2013-11-11T07:25:22Z

@jankeromnes It sounds like that would make sense. See 66a5cd6

marijnh · 2014-01-27T17:20:36Z

This is the same issue as #2115

niftylettuce · 2016-01-14T06:16:18Z

Same issue here

marijnh closed this as completed Jan 27, 2014

minrk mentioned this issue May 16, 2017

incorrect cursor position for bold math text #4750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cursor mispositioning when using a custom font with unicode text #1813

Cursor mispositioning when using a custom font with unicode text #1813

Pomax commented Sep 10, 2013

marijnh commented Sep 11, 2013

Pomax commented Sep 11, 2013

peterkroon commented Sep 16, 2013

alicoding commented Sep 16, 2013

alicoding commented Sep 22, 2013

marijnh commented Sep 23, 2013

Pomax commented Sep 23, 2013

marijnh commented Sep 24, 2013

alicoding commented Sep 24, 2013

Pomax commented Sep 24, 2013

marijnh commented Sep 24, 2013

Pomax commented Sep 24, 2013

marijnh commented Sep 24, 2013

Pomax commented Sep 24, 2013

marijnh commented Sep 24, 2013

jankeromnes commented Nov 7, 2013

marijnh commented Nov 11, 2013

marijnh commented Jan 27, 2014

niftylettuce commented Jan 14, 2016

Cursor mispositioning when using a custom font with unicode text #1813

Cursor mispositioning when using a custom font with unicode text #1813

Comments

Pomax commented Sep 10, 2013

marijnh commented Sep 11, 2013

Pomax commented Sep 11, 2013

peterkroon commented Sep 16, 2013

alicoding commented Sep 16, 2013

alicoding commented Sep 22, 2013

marijnh commented Sep 23, 2013

Pomax commented Sep 23, 2013

marijnh commented Sep 24, 2013

alicoding commented Sep 24, 2013

Pomax commented Sep 24, 2013

marijnh commented Sep 24, 2013

Pomax commented Sep 24, 2013

marijnh commented Sep 24, 2013

Pomax commented Sep 24, 2013

marijnh commented Sep 24, 2013

jankeromnes commented Nov 7, 2013

marijnh commented Nov 11, 2013

marijnh commented Jan 27, 2014

niftylettuce commented Jan 14, 2016