-
-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for RTL languages #784
Comments
https://terminal-wg.pages.freedesktop.org/bidi/ has some excellent notes on how to model bidi in terminal emulators. To make progress, I need to better understand:
|
Can pango be an option? Although it has some gtk dependencies, it uses harfbuzz and fribidi. That's probably how gnome-terminal supports RTL. For testing, let me know if I can help by providing text content. Also, if you are using harfbuzz, shouldn't Arabic script characters get combined? Currently they don't. |
I forgot to say that (I think) @behdad (creator of harfbuzz) is responsive, if you have questions. |
There is also servo's unicode-bidi. Mentioned in alacritty/alacritty#663. |
There's discussion on kas-gui/kas-text#20 about bidi implementations for Rust. My impression right now is that the state of bidi in Rust is young and that the easiest path will result in a relatively slow bidi implementation, which isn't ideal: shaping already costs perf in wezterm today. Putting in more work on the promising alternative mentioned in that thread will likely be a better end-state, but will take more effort and that shouldn't be owned by wezterm. The main constraint I have right now is time: if someone has time and wants to drive this forward, I'm very receptive to seeing wezterm support bidi and helping that person figure out how to integrate it into wezterm. |
We don't do anything with these; this is just teaching the parser how to recognize these codes. refs: #784
In order to support RTL/BIDI, wezterm needs a bidi implementation. I don't think a well-conforming rust implementation exists today; what I found were implementations that didn't pass 100% of the conformance tests. So I decided to port "bidiref", the reference implementation of the UBA described in http://unicode.org/reports/tr9/ to Rust. This implementation focuses on conformance: no special measures have been taken to optimize it so far, with my focus having been to ensure that all of the approx 780,000 test cases in the unicode data for unicode 14 pass. Having the tests passing 100% allows for making performance improvements with confidence in the future. The API isn't completely designed/fully baked. Until I get to hooking it up to wezterm's shaper, I'm not 100% sure exactly what I'll need. There's a good discussion on API in open-i18n/rust-unic#273 that suggests omitting "legacy" operations such as reordering. I suspect that wezterm may actually need that function to support monospace text layout in some terminal scenarios, but regardless: reordering is part of the conformance test suite so it remains a part of the API. That said: the API does model the major operations as separate phases, so you should be able to pay for just what you use: * Resolving the embedding levels from a paragraph * Returning paragraph runs of those levels (and their directions) * Returning the whitespace-level-reset runs for a line-slice within the paragraph * Returning the reordered indices + levels for a line-slice within the paragraph. refs: #784 refs: kas-gui/kas-text#20
This commit is larger than it appears to due fanout from threading through bidi parameters. The main changes are: * When clustering cells, add an additional phase to resolve embedding levels and further sub-divide a cluster based on the resolved bidi runs; this is where we get the direction for a run and this needs to be passed through to the shaper. * When doing bidi, the forced cluster boundary hack that we use to de-ligature when cursoring through text needs to be disabled, otherwise the cursor appears to push/rotate the text in that cluster when moving through it! We'll need to find a different way to handle shading the cursor that eliminates the original cursor/ligature/black issue. * In the shaper, the logic for coalescing unresolved runs for font fallback assumed LTR and needed to be adjusted to cluster RTL. That meant also computing a little index of codepoint lengths. * Added `experimental_bidi` boolean option that defaults to false. When enabled, it activates the bidi processing phase in clustering with a strong hint that the paragraph is LTR. This implementation is incomplete and/or wrong for a number of cases: * The config option should probably allow specifying the paragraph direction hint to use by default. * https://terminal-wg.pages.freedesktop.org/bidi/recommendation/paragraphs.html recommends that bidi be applied to logical lines, not physical lines (or really: ranges within physical lines) that we're doing at the moment * The paragraph direction hint should be overridden by cell attributes and other escapes; see 85a6b17 and probably others. However, as of this commit, if you `experimental_bidi=true` then ``` echo This is RTL -> عربي فارسی bidi ``` (that text was sourced from: microsoft/terminal#538 (comment)) then wezterm will display the text in the same order as the text renders in Chrome for that github comment. ``` ; ./target/debug/wezterm --config experimental_bidi=false ls-fonts --text "عربي فارسی ->" LeftToRight 0 ع \u{639} x_adv=8 glyph=300 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 2 ر \u{631} x_adv=3.78125 glyph=273 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 4 ب \u{628} x_adv=4 glyph=244 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 6 ي \u{64a} x_adv=4 glyph=363 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 8 \u{20} x_adv=8 glyph=2 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false}) /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs 9 ف \u{641} x_adv=11 glyph=328 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 11 ا \u{627} x_adv=4 glyph=240 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 13 ر \u{631} x_adv=3.78125 glyph=273 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 15 س \u{633} x_adv=10 glyph=278 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 17 ی \u{6cc} x_adv=4 glyph=664 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 19 \u{20} x_adv=8 glyph=2 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false}) /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs 20 - \u{2d} x_adv=8 glyph=276 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false}) /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs 21 > \u{3e} x_adv=8 glyph=338 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false}) /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs ``` ``` ; ./target/debug/wezterm --config experimental_bidi=true ls-fonts --text "عربي فارسی ->" RightToLeft 17 ی \u{6cc} x_adv=9 glyph=906 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 15 س \u{633} x_adv=10 glyph=277 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 13 ر \u{631} x_adv=4.78125 glyph=272 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 11 ا \u{627} x_adv=4 glyph=241 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 9 ف \u{641} x_adv=5 glyph=329 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 8 \u{20} x_adv=8 glyph=2 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false}) /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs 6 ي \u{64a} x_adv=9 glyph=904 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 4 ب \u{628} x_adv=4 glyph=243 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 2 ر \u{631} x_adv=5 glyph=273 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText 0 ع \u{639} x_adv=6 glyph=301 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false}) /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText LeftToRight 0 \u{20} x_adv=8 glyph=2 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false}) /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs 1 - \u{2d} x_adv=8 glyph=480 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false}) /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs 2 > \u{3e} x_adv=8 glyph=470 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false}) /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs ; ``` refs: #784
I've pushed a commit with what is probably the bare minimum level of support: I'm sure it's wrong in a number of cases, but with this as my test case (borrowed from microsoft/terminal#538 (comment)) Starting wezterm like this to start with the default config, then make the font bigger and turn on bidi mode:
that's equivalent to running with this config: return {
font_size = 36,
initial_rows = 5,
initial_cols = 30,
experimental_bidi = true, -- this is the bit you want to use to try this out
} Pasting: TODO:
|
Note that the terminal-wg bidi document, while giving the impression of being well researched, makes no mention of the DEC RTL sequences from the VT5xx terminals (e.g. DECRLM) and related modes supported by Hebrew terminal emulators like Hterm. IMO those existing modes were much more useful for anyone doing serious RTL development than any of the modern proposals. |
Thanks James; I'll queue up some more reading/research! |
@behdad I don't mean to pounce, but I wonder if you have suggestions specifically on handling the narrower glyphs in This is how that same sequence renders in Terminal.app: Even if I use the same font (which I think is the SF Arabic font), I still have gaps in my presentation. It feels like something in Terminal.app knows to stretch those ligatures and I wonder if harfbuzz has some way to express that? Or is this just deep magic in Apple's shaper/typography implementation? (Maybe sort of related: #1333 is a feature request for Devanagari support, which also has some challenging glyph widths for a terminal. Would love to hear your thoughts on that as well!) I'd also love to hear if you have other recommendations on bidi/rtl support in the context of a terminal? |
Currently I can report 2 issues:
The other one you mentioned yourself, a space between glyphs that are combined together; I see this problem in VTE based terminal as well, if a non-monospace font is used. If I use a monospace(DejaVu Sans Mono) font it shows correctly(in wezterm and VTE based terminal). |
Yes.
HarfBuzz doesn't know that. I haven't checked Terminal.app. It might be a geometric stretch. You sure it's using the same font? |
I didn't find exactly the font that Terminal.app is using, but I found that updating my local copy of Cascadia Code and using that looked better: I'll stop chasing that particular dragon :) |
Could you run:
I haven't done anything about cursor positioning or input so far. I don't know how to type this script into the terminal; could you run through how you do that? I'm assuming that you have a particular keyboard/IME configured. Could you walk me through typing a short bit of text (a couple of letters/glyphs) that mixes LTR and RTL so that I can try this for myself and not produce nonsense?
I think part of the docs to write up around this will be to suggest a good monospace font. |
Only letters(متن=text, فارسی=Persian=Farsi), which works fine:
|
There is Vazir Code fonts, the |
Thanks for this: it gives me something to play with and reason about! |
Thank you for working on this. |
@behdad At the moment, I use the UBA to produce runs of the various embedding levels (to determine the direction) and feed each of those to harfbuzz without any bidi reordering. https://harfbuzz.github.io/what-harfbuzz-doesnt-do.html doesn't explicitly say which parts of the bidi algorithm should be applied pre/post shaping. Do you have recommendations about this? I'm trying to figure out what I'm doing wrong for this example; the first grouping results in the space being reordered to the left and the last grouping has it reordered to the right. When wezterm renders these, it will render them starting from x=0 in the order they are listed below, incrementing x by the x_advance. The result is that there is no space between these runs, only around the edges.
|
I'm far removed from bidi algorithm right now to know what the expected output is. |
The Arabic text in that screenshot is set in Courier New. |
You need to reorder the runs, but without reversion the characters in RTL runs. |
Two problems: * Need reordered_runs method to populate ranges based on the reordered levels! * Use reordered runs to get the *logical* bounds of those runs and pass those to harfbuzz. Now the text is ordered correctly, but the rendering advances by the wrong amount for the reordered clusters and looks bad unless experimental_pixel_positioning=true. refs: #784
We were using the physical cell position to place the glyphs, but we need to use the visual cell position (post-RTL-reordering). refs: #784
This commit refines bidi property handling: * experimental_bidi has been split into two new configuration settings; `bidi_enabled` (which controls whether the terminal performs implicit bidi processing) and `bidi_direction` which specifies the base direction and whether auto detection is enabled. * The `Line` type can now store those bidi properties (they are actually split across 3 bits representing enabled, auto-detection and direction) * The terminal now has a concept of active bidi properties and default bidi properties * The default properties are pulled from the wezterm configuration * active bidi properties are potentially set via escape sequences, BDSM (which sets bidi_enabled) and SCP (which sets bidi_direction). We don't support the 2501 temporary dec private mode suggested by the BIDI recommendation doc at this time. * When creating new `Line`'s or clearing from the start of a `Line`, the effective bidi properties are computed (from the active props, falling back to default propr) and applied to the `Line`. * When rendering the line, we now look at its bidi properties instead of just the global config. The default bidi properties are `bidi_enabled: false` and `bidi_direction: LeftToRight` which corresponds to the typical bidi-unaware mode of most terminals. It is possible to live reload the config to change the effective defaults, but note that they apply, by design, to new lines being processed through the terminal. That means existing output is left unaffected by a config reload, but subsequently printed lines will respect it. Pressing CTRL-L or otherwise contriving to have the running application refresh its display should cause the refreshed display to update and apply the new bidi mode. refs: #784
with: ``` bidi_enabled = false, bidi_direction = "RightToLeft", ``` lines are now rendered right-justified in the terminal. I think there's still work to do on this, because the cluster order seems weird to me, but it's hard for me to intuit how this should really look. refs: #784
Current state of Config options:
Escape sequences:These are primarily for bidi-aware applications to cooperate with the terminal.
Stuff that still needs work:
My recommendation if anyone wanted to try this stuff in the nightly would be to run with |
I tried Also But yeah |
This puts to final rest #478, wherein ligatured glyphs that span cells would render portions of the glyph with the wrong fg color, where wrong was usually the bg color and cause the glyph to turn invisible when cursoring through the ligature. The approach used here is to divide the glyph into 7 discrete strips where each strip either intersects with the cursor, the selection, or neither. That allows us to render each strip with the appropriate foreground color. This change simplifies some of the logic and allows some other code to be removed, so that feels good! As is tradition with these renderer changes, there's a good chance that I overlooked something in testing and that the metrics or alignment might be slightly off for some font/text combo. Let's see how this goes! refs: #784 refs: #478 refs: #1617
Just wanted to say that, the work you're doing here is really awesome. |
@wez Example: |
I don't quite understand what you mean when you say "not in view". Can you expand on what you're trying and what you're seeing? |
Yes, of course. Oddly enough, when I didn't use fullscreen terminal (Using (And also, when you have a big file with only persian content in it, about 14 kb, terminal/vim gets really slow. But it's not important right now) |
@wez It's not a problem for me, but something I observed; when a tab title contains RTL text, the text is not shaped and is not rendered as BiDi. But you probably already know that 'cause you probably did not apply BiDi rendering there. |
I think the last time, I couldn't fully understand the problem. |
FYI, there is a new Unicode Working Group for Terminal Complex Script Support (TCSS). The initial proposal for the creation of the WG can be found at https://gist.github.com/XVilka/a0e49e1c65370ba11c17?permalink_comment_id=4615679#gistcomment-4615679 |
Hebrew looks great BTW, Conjoined RTLed alphabets are more complicated. |
cosmic-term is using cosmic-text(which uses the rustybuzz, swash and unicode-bidi crates, ATM, I think) for its text shaping, rendering and RTL and bidirectional rendering support. I don't know the details or how good it is, but thought it wouldn't hurt to mention it. |
With large language models able to parse 20+ human languages, I think the support is becoming more important than ever before. I read the thread and I couldn't really understand the solutions supported so far. I tried |
@thisismygitrepo try |
I try But there are still some problems like doesn't work in vim/neovim: |
|
@sajadspeed See vim/vim#7932. You can conceal it in Vim. |
I found a bug when editing big files containing Persian text using Vim. The first time you encounter RTL text by scrolling down, it's still LTR until you press Ctrl+L. |
You should probably specify if you're talking about shaping or alignment. This may help developers identify the problem. |
This is a Vim issue. Unrelated to the terminal. See: vim/vim#14115 |
Vim doesn't support Bidi and relies upon the terminal for this behavior. In the Gnome terminal, it is somewhat better—at least it shows the cursor where it really is. |
There's a very good document about RTL challenges in terms of terminal display. |
Is your feature request related to a problem? Please describe.
wezterm cannot display right to left languages correctly. RTL text is not not RTL, and characters that need to be combined, are not.
Describe the solution you'd like
Support RTL text. Probably needs bidirectional text handling and text shaping.
Related projects: harfbuzz, FriBidi
Describe alternatives you've considered
Konsole and gnome-terminal support RTL languages.
Additional context
Image, although the image is comparing Konsole with Alacritty, wezterm works just like Alacritty.
The text was updated successfully, but these errors were encountered: