Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor mispositioning when using a custom font with unicode text #1813

Closed
Pomax opened this issue Sep 10, 2013 · 19 comments
Closed

Cursor mispositioning when using a custom font with unicode text #1813

Pomax opened this issue Sep 10, 2013 · 19 comments

Comments

@Pomax
Copy link

Pomax commented Sep 10, 2013

We were running into some weird behaviour on thimble.webmaker.org when using languages that require unicode with combining diacritics and custom fonts, where the cursor positions and actual text don't line up. STR are thankfully easy:

  1. load up http://codemirror.net/demo/complete.html
  2. open the web console, "edit as html" one of the style elements in the head and add this to the end of it:
<style>
@font-face {
  font-family: ubuntu;
  src: url('https://thimble.webmaker.org/friendlycode/css/ubuntumono/ubuntumono-r.woff');
}
</style>
  1. add a new CSS rule * { font-family: ubuntu!important; }

All elements on the page should now be using the Ubuntu Mono Regular font.

  1. clear the codemirror area and put in this text instead:
ดำดำดำดำดำดำดำดำดำ
กำกำกำกำกำกำกำกำกำ

This is just some nonsense Thai text, but demonstrates the problem immediately: click anywhere in the text, then hit enter, and notice that line gets broken in a completely different place. Placing the cursor at the start of the line and using the right cursor key to walk through it also shows the cursor moving far more than the rendered text suggests it should.

@marijnh
Copy link
Member

marijnh commented Sep 11, 2013

Which browser (including version) and OS are you using? With Chrome and Firefox on Linux, everything seems to work as intended.

This is the HTML code I'm using:

<!doctype html>
<meta charset="utf-8"/>
<link rel="stylesheet" href="lib/codemirror.css">
<script src="lib/codemirror.js"></script>
<style type="text/css">
 @font-face {
  font-family: ubuntu;
  src: url('https://thimble.webmaker.org/friendlycode/css/ubuntumono/ubuntumono-r.woff');
 }
 .CodeMirror { font-family: ubuntu; }
</style>

<textarea id="code" name="code">ดำดำดำดำดำดำดำดำดำ
กำกำกำกำกำกำกำกำกำ
</textarea>

<script>var editor = CodeMirror.fromTextArea(document.getElementById("code"));</script>

@Pomax
Copy link
Author

Pomax commented Sep 11, 2013

From what we can tell, Chrome and Opera on Windows and Chrome on OSX (FF and IE do not seem affected)

@peterkroon
Copy link
Contributor

Unable to reproduce on Ubuntu 13.04 Chrome stable 32 bit

@alicoding
Copy link

OS both tested and breaks.

OS X Mavericks (10.9)
Mountain Lion (10.8)

Chrome Version 29.0.1547.65
Also Safari, Opera as well.

@alicoding
Copy link

Tested again with Mac OS X 10.8 and Windows 8 on Chrome stable 64 bit and they breaks on that Thai language.

@marijnh
Copy link
Member

marijnh commented Sep 23, 2013

Still no luck reproducing it. Here's what I've been doing:

  • I put the HTML test I gave above into a file
  • Open that file in Chrome 29 (in all of OS X, Windows, and Linux)
  • Click the top line of Thai characters somewhere near the middle
  • Cursor appear where I clicked
  • Press enter
  • Line is broken where the cursor was showing

@Pomax
Copy link
Author

Pomax commented Sep 23, 2013

As a clean from-file immediate load, I can't reliably reproduce it, but as a "start up codemirror, then set new content after initial load", like the original STR, I see it happen virtually all the time.

@marijnh
Copy link
Member

marijnh commented Sep 24, 2013

Following the original steps also doesn't reproduce it for me. But those steps don't make any sense for me -- there are no style tags in the head of demo/complete.html, and if I add one using dev tools and add those rules to it, the browser doesn't seem to load the font.

If you can be bothered to create a test case that reliably reproduces the issue and put it online (maybe jsbin.com), I'll take another look. If not, I'm out of patience.

@alicoding
Copy link

So I was going to do what @marijnh was asking and found out that jsbin also have the same issue (and I believe jsbin.com uses codeMirror as well)

I have tested with Chrome stable, Canary, Opera, Safari and all broke except Firefox.

Below is the video to demonstrate the problem.

https://dl-web.dropbox.com/spa/n5v3bx9nnjkdpzf/codemirror.mp4

@Pomax
Copy link
Author

Pomax commented Sep 24, 2013

sorry, original STR should have been "html-edit any of the elements in the head, and add a style element". I'm not trying to make you play a guessing game but on windows and OSX this is a guaranteed way to reproduce the effect, tested by multiple people on multiple computers, and it's impacting our effort to localise Mozilla's webmaker.org to things like Thai or other non-latin scripts. Which OS are you testing this on? (because it does not happen on linux, it's probably using a better fallback font with metrics that don't mess up codemirror)

@marijnh
Copy link
Member

marijnh commented Sep 24, 2013

I finally managed to reproduce this on Windows. Still no luck on OS X, but that might be because I'm still on Mountain Lion (hardware too old for Apple to allow me to upgrade further).

The problem is that apparently, some platforms, with some fonts, render an dashed empty circle in front of the 'ำ' character when it is in a span element on its own. That circle is showing up in the hidden element that's used to measure the position of characters, and throwing off the measurements.

I haven't been able to come up with a quick workaround. I have some vague plans to overhaul the measuring system, which would address this (along with a bunch of wrapping related bugs and some of the slowness of big lines), but implementing that is a big project, and I don't know when I'll have the time and motivation to work on it yet.

How much of a showstopper is this for you? You could consider just switching to courier or some other system font when you detect Thai language. That's a silly hack, but I guess it's better than having incorrect cursor placement.

@Pomax
Copy link
Author

Pomax commented Sep 24, 2013

oh! that empty circle is a unicode combining mark placeholder... if it's putting that into spans in isolation then this might actually be a javascript split() problem, where a string with combining marks (in Thai that's also things like vowels) is being split up based on individual unicode code points, rather than splitting across "letter" boundaries. I wonder if there's a small JS lib that will do correct unicode splitting for us here (will have a look).

In terms of severity it's tricky: we have webmaker fully localisted for Thai and Russian, with Korean on its way, and we haven't released the localised version yet, but they're rearing to go. We're holding off on them until we can somehow fix or work around this issue though, since the user experience is one that might actually drive users away because they can't reliably edit their content.

@marijnh
Copy link
Member

marijnh commented Sep 24, 2013

CM is already handling combining marks for some languages (for example `COMBINING_ACUTE_ACCENT 769), but those currently have the effect of making the two code points act as a single letter -- you can't put your cursor between the 'e' and the accent in 'é', even when it is written as two code points.

Would that behavior also be appropriate in this case? From the look of it, it seems that these are more like two separate characters where the second happens to add a something to the first.

@Pomax
Copy link
Author

Pomax commented Sep 24, 2013

That sounds pretty much exactly what is necessary here, too: you shouldn't be able to put the cursor between the two parts that make up ดำ, as it's "one thing" in combination (although hitting backspace should still work and turn it into ด). Would this also work for base glyphs with more than one combining mark? Vietname ờ for instance, which github's comment box mangles pretty badly but is "o" + 0x31B (combining horn) + 0x340 (combining grave tone mark) while still being only a single "letter" in terms of cursor positioning.

@marijnh
Copy link
Member

marijnh commented Sep 24, 2013

So, JavaScript's support for this stuff is pretty much zero. I've been adding regexp ranges to detect combining characters in various scripts as I find that users are having trouble with them. Can you give me a range or set of ranges that correspond to combining characters in the Thai script? Do I see correctly that there are also prefix combining characters? (Where a combining code point following a non-combining one still leads to a composite glyph.)

@jankeromnes
Copy link
Contributor

I see that aforementioned regexp ranges already include the Combining Diacritical Marks (0300–036F, which by the way includes 0x31B and 0x340) plus a few others. Would it make sense to also match the following ranges?

  • Combining Diacritical Marks Supplement (1DC0–1DFF)
  • Combining Diacritical Marks for Symbols (20D0–20FF)
  • Combining Half Marks (FE20–FE2F)

Source: https://en.wikipedia.org/wiki/Combining_character#Unicode_ranges

@marijnh
Copy link
Member

marijnh commented Nov 11, 2013

@jankeromnes It sounds like that would make sense. See 66a5cd6

@marijnh
Copy link
Member

marijnh commented Jan 27, 2014

This is the same issue as #2115

@marijnh marijnh closed this as completed Jan 27, 2014
@niftylettuce
Copy link

Same issue here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants