Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for RTL languages #784

Open
CIAvash opened this issue May 11, 2021 · 58 comments
Open

Add support for RTL languages #784

CIAvash opened this issue May 11, 2021 · 58 comments
Labels
enhancement New feature or request

Comments

@CIAvash
Copy link

CIAvash commented May 11, 2021

Is your feature request related to a problem? Please describe.
wezterm cannot display right to left languages correctly. RTL text is not not RTL, and characters that need to be combined, are not.

Describe the solution you'd like
Support RTL text. Probably needs bidirectional text handling and text shaping.

Related projects: harfbuzz, FriBidi

Describe alternatives you've considered
Konsole and gnome-terminal support RTL languages.

Additional context
Image, although the image is comparing Konsole with Alacritty, wezterm works just like Alacritty.

@wez
Copy link
Owner

wez commented May 12, 2021

https://terminal-wg.pages.freedesktop.org/bidi/ has some excellent notes on how to model bidi in terminal emulators.

To make progress, I need to better understand:

  • what is the bare minimum set of features/level of conformance required to make bidi useful in a terminal emulator?
  • how I can reasonably test support when I don't understand the RTL scripts
  • what the feature gaps are between the current set of Rust bidi crates and eg: fribidi (which doesn't currently have Rust bindings, and whose viral LGPL license is probably fine, but potentially fraught with legal concerns in a statically linked application)

@CIAvash
Copy link
Author

CIAvash commented May 12, 2021

Can pango be an option? Although it has some gtk dependencies, it uses harfbuzz and fribidi. That's probably how gnome-terminal supports RTL.

For testing, let me know if I can help by providing text content.

Also, if you are using harfbuzz, shouldn't Arabic script characters get combined? Currently they don't.

@CIAvash
Copy link
Author

CIAvash commented May 12, 2021

I forgot to say that (I think) @behdad (creator of harfbuzz) is responsive, if you have questions.

@CIAvash
Copy link
Author

CIAvash commented May 18, 2021

There is also servo's unicode-bidi. Mentioned in alacritty/alacritty#663.

@wez
Copy link
Owner

wez commented May 19, 2021

There's discussion on kas-gui/kas-text#20 about bidi implementations for Rust.

My impression right now is that the state of bidi in Rust is young and that the easiest path will result in a relatively slow bidi implementation, which isn't ideal: shaping already costs perf in wezterm today. Putting in more work on the promising alternative mentioned in that thread will likely be a better end-state, but will take more effort and that shouldn't be owned by wezterm.

The main constraint I have right now is time: if someone has time and wants to drive this forward, I'm very receptive to seeing wezterm support bidi and helping that person figure out how to integrate it into wezterm.

wez added a commit that referenced this issue Jan 20, 2022
We don't do anything with these; this is just teaching
the parser how to recognize these codes.

refs: #784
wez added a commit that referenced this issue Jan 25, 2022
In order to support RTL/BIDI, wezterm needs a bidi implementation.  I
don't think a well-conforming rust implementation exists today; what I
found were implementations that didn't pass 100% of the conformance
tests.

So I decided to port "bidiref", the reference implementation of the UBA
described in http://unicode.org/reports/tr9/ to Rust.

This implementation focuses on conformance: no special measures have
been taken to optimize it so far, with my focus having been to ensure
that all of the approx 780,000 test cases in the unicode data for
unicode 14 pass.  Having the tests passing 100% allows for making
performance improvements with confidence in the future.

The API isn't completely designed/fully baked.  Until I get to hooking
it up to wezterm's shaper, I'm not 100% sure exactly what I'll need.
There's a good discussion on API in
open-i18n/rust-unic#273 that suggests omitting
"legacy" operations such as reordering. I suspect that wezterm may
actually need that function to support monospace text layout in some
terminal scenarios, but regardless: reordering is part of the
conformance test suite so it remains a part of the API.

That said: the API does model the major operations as separate
phases, so you should be able to pay for just what you use:

* Resolving the embedding levels from a paragraph
* Returning paragraph runs of those levels (and their directions)
* Returning the whitespace-level-reset runs for a line-slice within the
  paragraph
* Returning the reordered indices + levels for a line-slice within the
  paragraph.

refs: #784
refs: kas-gui/kas-text#20
wez added a commit that referenced this issue Jan 25, 2022
This commit is larger than it appears to due fanout from threading
through bidi parameters.  The main changes are:

* When clustering cells, add an additional phase to resolve embedding
  levels and further sub-divide a cluster based on the resolved bidi
  runs; this is where we get the direction for a run and this needs
  to be passed through to the shaper.
* When doing bidi, the forced cluster boundary hack that we use to
  de-ligature when cursoring through text needs to be disabled,
  otherwise the cursor appears to push/rotate the text in that
  cluster when moving through it! We'll need to find a different
  way to handle shading the cursor that eliminates the original
  cursor/ligature/black issue.
* In the shaper, the logic for coalescing unresolved runs for font
  fallback assumed LTR and needed to be adjusted to cluster RTL.
  That meant also computing a little index of codepoint lengths.
* Added `experimental_bidi` boolean option that defaults to false.
  When enabled, it activates the bidi processing phase in clustering
  with a strong hint that the paragraph is LTR.

This implementation is incomplete and/or wrong for a number of cases:

* The config option should probably allow specifying the paragraph
  direction hint to use by default.
* https://terminal-wg.pages.freedesktop.org/bidi/recommendation/paragraphs.html
  recommends that bidi be applied to logical lines, not physical
  lines (or really: ranges within physical lines) that we're doing
  at the moment
* The paragraph direction hint should be overridden by cell attributes
  and other escapes; see 85a6b17

and probably others.

However, as of this commit, if you `experimental_bidi=true` then

```
echo This is RTL -> عربي فارسی bidi
```

(that text was sourced from:
microsoft/terminal#538 (comment))

then wezterm will display the text in the same order as the text
renders in Chrome for that github comment.

```
; ./target/debug/wezterm --config experimental_bidi=false ls-fonts --text "عربي فارسی ->"
LeftToRight
 0 ع    \u{639}      x_adv=8  glyph=300  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 2 ر    \u{631}      x_adv=3.78125 glyph=273  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 4 ب    \u{628}      x_adv=4  glyph=244  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 6 ي    \u{64a}      x_adv=4  glyph=363  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 8      \u{20}       x_adv=8  glyph=2    wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
                                      /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
 9 ف    \u{641}      x_adv=11 glyph=328  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
11 ا    \u{627}      x_adv=4  glyph=240  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
13 ر    \u{631}      x_adv=3.78125 glyph=273  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
15 س    \u{633}      x_adv=10 glyph=278  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
17 ی    \u{6cc}      x_adv=4  glyph=664  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
19      \u{20}       x_adv=8  glyph=2    wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
                                      /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
20 -    \u{2d}       x_adv=8  glyph=276  wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
                                      /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
21 >    \u{3e}       x_adv=8  glyph=338  wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
                                      /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
```

```
; ./target/debug/wezterm --config experimental_bidi=true ls-fonts --text "عربي فارسی ->"
RightToLeft
17 ی    \u{6cc}      x_adv=9  glyph=906  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
15 س    \u{633}      x_adv=10 glyph=277  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
13 ر    \u{631}      x_adv=4.78125 glyph=272  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
11 ا    \u{627}      x_adv=4  glyph=241  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 9 ف    \u{641}      x_adv=5  glyph=329  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 8      \u{20}       x_adv=8  glyph=2    wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
                                      /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
 6 ي    \u{64a}      x_adv=9  glyph=904  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 4 ب    \u{628}      x_adv=4  glyph=243  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 2 ر    \u{631}      x_adv=5  glyph=273  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
 0 ع    \u{639}      x_adv=6  glyph=301  wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
                                      /System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
LeftToRight
 0      \u{20}       x_adv=8  glyph=2    wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
                                      /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
 1 -    \u{2d}       x_adv=8  glyph=480  wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
                                      /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
 2 >    \u{3e}       x_adv=8  glyph=470  wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
                                      /Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
;
```

refs: #784
wez added a commit that referenced this issue Jan 25, 2022
@wez
Copy link
Owner

wez commented Jan 25, 2022

I've pushed a commit with what is probably the bare minimum level of support: I'm sure it's wrong in a number of cases, but with this as my test case (borrowed from microsoft/terminal#538 (comment))

Starting wezterm like this to start with the default config, then make the font bigger and turn on bidi mode:

wezterm -n --config font_size=36 --config initial_rows=5 --config initial_cols=30 \
    --config experimental_bidi=true

that's equivalent to running with this config:

return {
   font_size = 36,
   initial_rows = 5,
   initial_cols = 30,
   experimental_bidi = true, -- this is the bit you want to use to try this out
}

Pasting: This is RTL -> عربي فارسی into the terminal:

image

TODO:

  • Get feedback from people that actually work with RTL languages in the terminal to get a sense of what is actually good/desirable and/or counter-examples to see what to avoid
  • Config option should probably not be a boolean, but instead allow user to specify Disabled, LTR, RTL or Auto-LTR bidi resolution modes
  • Respect the bidi related escape sequences as discussed in https://terminal-wg.pages.freedesktop.org/bidi/recommendation/escape-sequences.html
  • DECRLM and other modes; take a look at Hterm as was suggested by j4james below
  • Consider carefully the default mode for bidi. https://terminal-wg.pages.freedesktop.org/bidi/recommendation/basic-modes.html suggests a default mode that isn't compatible with terminal hardware as pointed out by @j4james in the linked issue. Need to find a good balance between compatibility and things working as users expect.
  • Currently, wezterm resolves bidi on attributed text runs within a single physical (post-wrapped) line. https://terminal-wg.pages.freedesktop.org/bidi/recommendation/paragraphs.html recommends that it be carried out on the logical line prior to wrapping. That also needs some thought. Right now the context in which bidi is computed in wezterm doesn't have access to that paragraph information. This will be easier to reconcile if someone can provide a good example of how things should look.
  • Turning on bidi mode effectively regresses Whole ligature turns black when cursor is on one end, based on the current font #478 so we'll need to revisit how to resolve that
  • In the example above, فا renders with a gap between the ligature, whereas what renders in the browser doesn't have a gap. I believe that is because the first portion of that sequence has half-width and since we're monospace the other half is a gap. I don't know what the best way to handle this is.

@j4james
Copy link

j4james commented Jan 25, 2022

Note that the terminal-wg bidi document, while giving the impression of being well researched, makes no mention of the DEC RTL sequences from the VT5xx terminals (e.g. DECRLM) and related modes supported by Hebrew terminal emulators like Hterm. IMO those existing modes were much more useful for anyone doing serious RTL development than any of the modern proposals.

@wez
Copy link
Owner

wez commented Jan 25, 2022

Thanks James; I'll queue up some more reading/research!

@wez
Copy link
Owner

wez commented Jan 26, 2022

@behdad I don't mean to pounce, but I wonder if you have suggestions specifically on handling the narrower glyphs in فا in a monospace/terminal context; the x_advance in this case is approx. half the monospace cell width. wezterm uses harfbuzz under the covers, but has some logic to override x_advance to make cells line up. Is this particular case best solved simply by using a different font that has wider versions of these glyphs? Or are there some recommended flags/modes for harfbuzz that I should consider?

This is how that same sequence renders in Terminal.app:
image

Even if I use the same font (which I think is the SF Arabic font), I still have gaps in my presentation. It feels like something in Terminal.app knows to stretch those ligatures and I wonder if harfbuzz has some way to express that? Or is this just deep magic in Apple's shaper/typography implementation?

(Maybe sort of related: #1333 is a feature request for Devanagari support, which also has some challenging glyph widths for a terminal. Would love to hear your thoughts on that as well!)

I'd also love to hear if you have other recommendations on bidi/rtl support in the context of a terminal?

@CIAvash
Copy link
Author

CIAvash commented Jan 26, 2022

Currently I can report 2 issues:

  • If you put a number or LTR letter after an RTL letter(with or without space), it becomes LTR. On VTE based terminal, numbers work fine, but if you put an LTR letter, it becomes LTR.
  • Moving cursor position doesn't follow the RTL letter positions, So you can't tell where your'e typing(or changing) a letter.

The other one you mentioned yourself, a space between glyphs that are combined together; I see this problem in VTE based terminal as well, if a non-monospace font is used. If I use a monospace(DejaVu Sans Mono) font it shows correctly(in wezterm and VTE based terminal).

@behdad
Copy link

behdad commented Jan 26, 2022

Is this particular case best solved simply by using a different font that has wider versions of these glyphs?

Yes.

It feels like something in Terminal.app knows to stretch those ligatures and I wonder if harfbuzz has some way to express that? Or is this just deep magic in Apple's shaper/typography implementation?

HarfBuzz doesn't know that. I haven't checked Terminal.app. It might be a geometric stretch. You sure it's using the same font?

@behdad
Copy link

behdad commented Jan 26, 2022

This is how that same sequence renders in Terminal.app:
image

Looks obviously a different font.

@wez
Copy link
Owner

wez commented Jan 26, 2022

I didn't find exactly the font that Terminal.app is using, but I found that updating my local copy of Cascadia Code and using that looked better: I'll stop chasing that particular dragon :)

@wez
Copy link
Owner

wez commented Jan 26, 2022

Currently I can report 2 issues:

  • If you put a number or LTR letter after an RTL letter(with or without space), it becomes LTR. On VTE based terminal, numbers work fine, but if you put an LTR letter, it becomes LTR.

Could you run: wezterm ls-fonts --text "EXAMPLE" where example is the text sequence you're trying, so that I can see exactly what sequence you mean and also what wezterm thinks it is doing?

  • Moving cursor position doesn't follow the RTL letter positions, So you can't tell where your'e typing(or changing) a letter.

I haven't done anything about cursor positioning or input so far. I don't know how to type this script into the terminal; could you run through how you do that? I'm assuming that you have a particular keyboard/IME configured. Could you walk me through typing a short bit of text (a couple of letters/glyphs) that mixes LTR and RTL so that I can try this for myself and not produce nonsense?

The other one you mentioned yourself, a space between glyphs that are combined together; I see this problem in VTE based terminal as well, if a non-monospace font is used. If I use a monospace(DejaVu Sans Mono) font it shows correctly(in wezterm and VTE based terminal).

I think part of the docs to write up around this will be to suggest a good monospace font. Cascadia Code is another option that at least is monospace, but for which I am not equipped to comment on legibility/usability vs. other Arabic fonts!

@CIAvash
Copy link
Author

CIAvash commented Jan 26, 2022

Could you run: wezterm ls-fonts --text "EXAMPLE" where example is the text sequence you're trying, so that I can see exactly what sequence you mean and also what wezterm thinks it is doing?

Only letters(متن=text, فارسی=Persian=Farsi), which works fine:

متن فارسی

wezterm ls-fonts --text "متن فارسی"
RightToLeft
15 ی    \u{6cc}      x_adv=10 glyph=3113 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
13 س    \u{633}      x_adv=10 glyph=3182 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
11 ر    \u{631}      x_adv=10 glyph=1127 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 9 ا    \u{627}      x_adv=10 glyph=3145 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 7 ف    \u{641}      x_adv=10 glyph=3214 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 6      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
 4 ن    \u{646}      x_adv=10 glyph=3233 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 2 ت    \u{62a}      x_adv=10 glyph=3155 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0 م    \u{645}      x_adv=10 glyph=3230 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig

Same text with spaces and a number(۲=2) between the words:

متن ۲ فارسی

wezterm ls-fonts --text "متن ۲ فارسی"
RightToLeft
 6      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
 4 ن    \u{646}      x_adv=10 glyph=3233 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 2 ت    \u{62a}      x_adv=10 glyph=3155 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0 م    \u{645}      x_adv=10 glyph=3230 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
LeftToRight
 0 ۲    \u{6f2}      x_adv=10 glyph=1194 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
RightToLeft
 9 ی    \u{6cc}      x_adv=10 glyph=3113 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 7 س    \u{633}      x_adv=10 glyph=3182 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 5 ر    \u{631}      x_adv=10 glyph=1127 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 3 ا    \u{627}      x_adv=10 glyph=3145 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 1 ف    \u{641}      x_adv=10 glyph=3214 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig

Same text with spaces and a number(2) between the words:

متن 2 فارسی

wezterm ls-fonts --text "متن 2 فارسی"
RightToLeft
 6      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
 4 ن    \u{646}      x_adv=10 glyph=3233 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 2 ت    \u{62a}      x_adv=10 glyph=3155 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0 م    \u{645}      x_adv=10 glyph=3230 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
LeftToRight
 0 2    \u{32}       x_adv=10 glyph=56   wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
RightToLeft
 9 ی    \u{6cc}      x_adv=10 glyph=3113 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 7 س    \u{633}      x_adv=10 glyph=3182 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 5 ر    \u{631}      x_adv=10 glyph=1127 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 3 ا    \u{627}      x_adv=10 glyph=3145 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 1 ف    \u{641}      x_adv=10 glyph=3214 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig

With double quotes the spaces are also misplaced.

Without double quotes: wezterm ls-fonts --text (echo -n متن 2 فارسی)

متن 2 فارسی

I haven't done anything about cursor positioning or input so far. I don't know how to type this script into the terminal; could you run through how you do that? I'm assuming that you have a particular keyboard/IME configured. Could you walk me through typing a short bit of text (a couple of letters/glyphs) that mixes LTR and RTL so that I can try this for myself and not produce nonsense?

I set keyboard layouts in Sway window manager like this:

input * {
    xkb_layout "us,ir"
    xkb_options "grp:shifts_toggle,compose:caps"
}

And toggle between English and Persian.

In X, I think it's with this command: setxkbmap -layout us,ir -option grp:shifts_toggle
or xorg config:

    Option "XkbLayout" "us,ir"
    Option "XkbOptions" "grp:shifts_toggle"

You can use online virtual keyboards:
https://www.branah.com/farsi - With this you can switch between Persian and English
https://www.lexilogos.com/keyboard/persian.htm - This one has the pronunciation of letters

So for typing "متن ۱ فارسی" in Persian keyboard layout:
You would hit these keys: l j k SPACE 1 SPACE t h v s d
For "متن RTL و متن LTR": l j k SPACE R T L SPACE , SPACEl j k SPACE L T R
Last text on its own(Beginning with RTL letters):

متن RTL و متن LTR

Some random text samples:
From Persian alphabet:

الفبای فارسی یا الفبای فارسی-عربی شاملِ ۳۲ حرف است که از الفبای عربی اقتباس‌شده است.

From English language:

اِنگلیسی (به انگلیسی: English، ‎/ˈɪŋɡlɪʃ/‎) یک زبان طبیعی از خانواده زبانی زبان‌های هندواروپایی از شاخه زبان‌های ژرمنی غربی است که اولین بار در انگلستان در عهد آنگلوساکسون‌ها مورد تکلم قرار گرفت و انگلیسی باستان شکل گرفت.

From Persian language

There are several letters generally only used in Arabic loanwords. These letters are pronounced the same as similar Persian letters. For example, there are four functionally identical letters for /z/ (ز ذ ض ظ), three letters for /s/ (س ص ث), two letters for /t/ (ط ت), two letters for /h/ (ح ه). On the other hand, there are four letters that don't exist in Arabic پ چ ژ گ.

I think part of the docs to write up around this will be to suggest a good monospace font. Cascadia Code is another option that at least is monospace, but for which I am not equipped to comment on legibility/usability vs. other Arabic fonts!

I took a look at Cascadia Code, it seems it's the font Microsoft uses for Windows terminal. In my opinion it doesn't look good, letters get stretched and are sometimes hard to read. There may be better fonts, but I haven't searched for one.

@CIAvash
Copy link
Author

CIAvash commented Jan 27, 2022

There is Vazir Code fonts, the Vazir Code Hack seems to look better.

@wez
Copy link
Owner

wez commented Jan 27, 2022

Thanks for this: it gives me something to play with and reason about!

@CIAvash
Copy link
Author

CIAvash commented Jan 27, 2022

Thank you for working on this.

@wez
Copy link
Owner

wez commented Jan 27, 2022

@behdad At the moment, I use the UBA to produce runs of the various embedding levels (to determine the direction) and feed each of those to harfbuzz without any bidi reordering. https://harfbuzz.github.io/what-harfbuzz-doesnt-do.html doesn't explicitly say which parts of the bidi algorithm should be applied pre/post shaping. Do you have recommendations about this?

I'm trying to figure out what I'm doing wrong for this example; the first grouping results in the space being reordered to the left and the last grouping has it reordered to the right. When wezterm renders these, it will render them starting from x=0 in the order they are listed below, incrementing x by the x_advance. The result is that there is no space between these runs, only around the edges.

; wezterm ls-fonts --text "متن ۲ فارسی"
RightToLeft
 6      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig
 4 ن    \u{646}      x_adv=10 glyph=3233 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 2 ت    \u{62a}      x_adv=10 glyph=3155 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0 م    \u{645}      x_adv=10 glyph=3230 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
LeftToRight
 0 ۲    \u{6f2}      x_adv=10 glyph=1194 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
RightToLeft
 9 ی    \u{6cc}      x_adv=10 glyph=3113 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 7 س    \u{633}      x_adv=10 glyph=3182 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 5 ر    \u{631}      x_adv=10 glyph=1127 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 3 ا    \u{627}      x_adv=10 glyph=3145 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 1 ف    \u{641}      x_adv=10 glyph=3214 wezterm.font("DejaVu Sans Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/DejaVuSansMono.ttf, FontConfig
 0      \u{20}       x_adv=10 glyph=1    wezterm.font("Roboto Mono", {weight="Regular", stretch="Normal", italic=false})
                                      /usr/share/fonts/TTF/RobotoMono-Regular.ttf, FontConfig

@behdad
Copy link

behdad commented Jan 27, 2022

I'm far removed from bidi algorithm right now to know what the expected output is.

@khaledhosny
Copy link

I didn't find exactly the font that Terminal.app is using,

The Arabic text in that screenshot is set in Courier New.

@khaledhosny
Copy link

without any bidi reordering

You need to reorder the runs, but without reversion the characters in RTL runs.

wez added a commit that referenced this issue Jan 29, 2022
Two problems:

* Need reordered_runs method to populate ranges based on
  the reordered levels!
* Use reordered runs to get the *logical* bounds of those
  runs and pass those to harfbuzz.

Now the text is ordered correctly, but the rendering advances
by the wrong amount for the reordered clusters and looks bad
unless experimental_pixel_positioning=true.

refs: #784
wez added a commit that referenced this issue Jan 29, 2022
We were using the physical cell position to place the glyphs,
but we need to use the visual cell position (post-RTL-reordering).

refs: #784
wez added a commit that referenced this issue Jan 30, 2022
This commit refines bidi property handling:

* experimental_bidi has been split into two new configuration settings;
  `bidi_enabled` (which controls whether the terminal performs implicit
  bidi processing) and `bidi_direction` which specifies the base
  direction and whether auto detection is enabled.
* The `Line` type can now store those bidi properties (they are actually
  split across 3 bits representing enabled, auto-detection and
  direction)
* The terminal now has a concept of active bidi properties and default
  bidi properties
* The default properties are pulled from the wezterm configuration
* active bidi properties are potentially set via escape sequences,
  BDSM (which sets bidi_enabled) and SCP (which sets bidi_direction).
  We don't support the 2501 temporary dec private mode suggested by
  the BIDI recommendation doc at this time.
* When creating new `Line`'s or clearing from the start of a `Line`, the
  effective bidi properties are computed (from the active props,
  falling back to default propr) and applied to the `Line`.
* When rendering the line, we now look at its bidi properties instead
  of just the global config.

The default bidi properties are `bidi_enabled: false` and
`bidi_direction: LeftToRight` which corresponds to the typical
bidi-unaware mode of most terminals.

It is possible to live reload the config to change the effective
defaults, but note that they apply, by design, to new lines being
processed through the terminal.  That means existing output is
left unaffected by a config reload, but subsequently printed lines
will respect it.  Pressing CTRL-L or otherwise contriving to have
the running application refresh its display should cause the
refreshed display to update and apply the new bidi mode.

refs: #784
wez added a commit that referenced this issue Jan 31, 2022
with:

```
  bidi_enabled = false,
  bidi_direction = "RightToLeft",
```

lines are now rendered right-justified in the terminal.
I think there's still work to do on this, because the cluster
order seems weird to me, but it's hard for me to intuit how
this should really look.

refs: #784
@wez
Copy link
Owner

wez commented Jan 31, 2022

Current state of main:

Config options:

  • bidi_enabled = false. if set to true, wezterm will apply the bidi algorithm to lines at render time. Otherwise, wezterm will assume that the application(s) running in the terminal will output pre-bidi-shuffled output. The default is false.
  • bidi_direction = "LeftToRight". Possible values: "LeftToRight", "RightToLeft", "AutoLeftToRight", "AutoRightToLeft". Specifies the line direction. The Auto versions will attempt to auto-detect based on the first strong character in the line, but otherwise fall back to the direction specified. When the direction is RightToLeft or AutoRightToLeft, wezterm will try to show the text right justified.

Escape sequences:

These are primarily for bidi-aware applications to cooperate with the terminal.
These are defined by ECMA-48 and adopted by VTE and mintty.

  • ^[[8h overrides the bidi_enabled config setting and sets it to true for subsequently output lines.
  • ^[[8l overrides the bidi_enabled config setting and sets it to false for subsequently output lines.
  • ^[[1 k overrides the bidi_direction config and sets it to LeftToRight for subsequently output lines.
  • ^[[2 k overrides the bidi_direction config and sets it to RightToLeft for subsequently output lines.
  • ^[[0 k restores the bidi_direction value to that specified in the config

Stuff that still needs work:

  • text cursor positioning especially when moving through RTL sequences. Not sure if it makes sense.
  • copy/paste copies the logical substring of text rather than the visual bidi-reordered text
  • right-justified rendering seems wonky to me. I think something in there needs to be iterated in a different order, but I haven't nailed down quite what that is.

My recommendation if anyone wanted to try this stuff in the nightly would be to run with bidi_enabled = true and just leave bidi_direction at its default LeftToRight value.

@CIAvash
Copy link
Author

CIAvash commented Jan 31, 2022

right-justified rendering seems wonky to me. I think something in there needs to be iterated in a different order, but I haven't nailed down quite what that is.

I tried AutoRightToLeft, but it made everything(the prompt as well) RTL and right-justified, sometimes not everything, even though there was no RTL text.

Also AutoLeftToRight had some misplaced spaces.

But yeah LeftToRight is working properly.

wez added a commit that referenced this issue Feb 5, 2022
This puts to final rest #478, wherein ligatured glyphs that span
cells would render portions of the glyph with the wrong fg color,
where wrong was usually the bg color and cause the glyph to turn
invisible when cursoring through the ligature.

The approach used here is to divide the glyph into 7 discrete strips
where each strip either intersects with the cursor, the selection, or
neither. That allows us to render each strip with the appropriate
foreground color.

This change simplifies some of the logic and allows some other code
to be removed, so that feels good!

As is tradition with these renderer changes, there's a good chance that
I overlooked something in testing and that the metrics or alignment
might be slightly off for some font/text combo.  Let's see how this
goes!

refs: #784
refs: #478
refs: #1617
@mostafaqanbaryan
Copy link

Just wanted to say that, the work you're doing here is really awesome.
Now support for RTL in wezterm, is much better than lots of other terminals.
Thank you.

@mostafaqanbaryan
Copy link

@wez
Beside cursor problem (as you are aware of it), there is something else as well.
When I open a file in Vim that has RTL lines inside it, only lines that are visible has correct formatting.
But RTL correction doesn't work for other lines in file that are not in view.
You have to reload wezterm config (restart terminal in some way, or bring up those lines and open the file again) to correct it.

Example:
When i open a file in vim, lines 1 to 30 are visible. but lines 30 to EOF that have RTL content, are like this:
image
And when using tail or cat, all the lines are like this too.

@wez
Copy link
Owner

wez commented May 23, 2022

I don't quite understand what you mean when you say "not in view". Can you expand on what you're trying and what you're seeing?

@mostafaqanbaryan
Copy link

mostafaqanbaryan commented May 24, 2022

I don't quite understand what you mean when you say "not in view". Can you expand on what you're trying and what you're seeing?

Yes, of course.
I have a (test) file with 11 lines in it (You can generate persian content with this site).
When I open vim, this is my terminal window:
Screenshot from 2022-05-24 08-08-06
But when I go down to see other lines:
Screenshot from 2022-05-24 08-08-40
Lines below the view (after line #6) are messed up.

Oddly enough, when I didn't use fullscreen terminal (Using ToggleFullScreen), this problem won't occure:
Screenshot from 2022-05-24 08-10-30
So now I think when I'm in fullscreen mode, RTL rendering won't be triggered.

(And also, when you have a big file with only persian content in it, about 14 kb, terminal/vim gets really slow. But it's not important right now)

@CIAvash
Copy link
Author

CIAvash commented May 24, 2022

@wez It's not a problem for me, but something I observed; when a tab title contains RTL text, the text is not shaped and is not rendered as BiDi. But you probably already know that 'cause you probably did not apply BiDi rendering there.

@mostafaqanbaryan
Copy link

I think the last time, I couldn't fully understand the problem.
The problem is, when I scroll in vim, terminal won't re-render and because of that, if some new text comes to visible part of screen, it would be messy.
But the text that was already on screen, has no problem.
If I use F11 and toggle fullscreen twice (go to fullscreen and back to floating mode), new visible texts would be fixed as well.

@ninjalj
Copy link
Contributor

ninjalj commented Jul 2, 2023

FYI, there is a new Unicode Working Group for Terminal Complex Script Support (TCSS). The initial proposal for the creation of the WG can be found at https://gist.github.com/XVilka/a0e49e1c65370ba11c17?permalink_comment_id=4615679#gistcomment-4615679

@yarons
Copy link

yarons commented Jul 21, 2023

Hebrew looks great BTW, Conjoined RTLed alphabets are more complicated.

@anonimo0-0
Copy link

Rendering RTL languages is pretty nice right now with bidi_enabled = true, however bidi_direction = "AutoLeftToRight" isn't that complete yet I think. I assume it defaults to LTR direction unless a character of an RTL language is detected before other characters?
But it doesn't seem to be working, for example this from nano
image
I expected the second line after Lorem ipsum to have a right-to-left direction, but it didn't.

Thanks a lot for you work, dealing with bidi stuff must be a headache!

@anonimo0-0
Copy link

To be clearer, here is an attached image of how mlterm does it:
image
When the line starts with a character that belongs to an RTL language, the line begins from the right side.

For those wondering why this matters, consider the following scenario of typing some words and pay attention to the order of how we typed the words:

Scenario 1

  1. start
  2. test
  3. نهاية
  4. الإختبار

Scenario 2

  1. نهاية
  2. الإختبار
  3. start
  4. test

If you look here, you will notice that wezterm renders these two lines in the same exact way, although they were typed in different order. First line is correct as it starts with English, first words inserted into nano in this case. Second line is wrong, as it should start with Arabic words first as they were typed before the English ones in this case. If the second line begins from the right side (as the case in mlterm above) this issue would be fixed.
image

@CIAvash
Copy link
Author

CIAvash commented Feb 15, 2024

cosmic-term is using cosmic-text(which uses the rustybuzz, swash and unicode-bidi crates, ATM, I think) for its text shaping, rendering and RTL and bidirectional rendering support.

I don't know the details or how good it is, but thought it wouldn't hurt to mention it.

@thisismygitrepo
Copy link

thisismygitrepo commented May 19, 2024

With large language models able to parse 20+ human languages, I think the support is becoming more important than ever before. I read the thread and I couldn't really understand the solutions supported so far. I tried --config experimental_bidi=true @wez but that gave me an error saying its invalid config.

@CIAvash
Copy link
Author

CIAvash commented May 19, 2024

@thisismygitrepo try wezterm --config bidi_enabled=true

@sajadspeed
Copy link

sajadspeed commented Jul 23, 2024

I try --config bidi_enabled=true with Vazir Code Font and it displays correctly:

Screenshot_20240723_114719

But there are still some problems like doesn't work in vim/neovim:
image

@MoSal
Copy link

MoSal commented Jul 23, 2024

But there are still some problems like doesn't work in vim/neovim:

:set noarabicshape

@sajadspeed
Copy link

:set noarabicshape

Yes it worked thank you.

There is only one more problem with the ZERO WIDTH NON-JOINER character with Unicode U+200C. In some places, like bash, when I press Shift+Space, it doesn't insert the character at all:
image

But in zsh:
image

In vim:
image

And in neovim:
image

I tried with every font and the problem was still there.

I don't think that the problem is exactly with the programs themselves, such as zsh or vim, because it behaves differently with the same font in Konsole.

It works fine in ‍‍bash‍ with Konsole:
image

And in Vim:
image

@avidseeker
Copy link

@sajadspeed See vim/vim#7932. You can conceal it in Vim.

@MahdiGMK
Copy link

MahdiGMK commented Oct 4, 2024

I found a bug when editing big files containing Persian text using Vim. The first time you encounter RTL text by scrolling down, it's still LTR until you press Ctrl+L.

@MoSal
Copy link

MoSal commented Oct 4, 2024

@MahdiGMK

I found a bug when editing big files containing Persian text using Vim. The first time you encounter RTL text by scrolling down, it's still LTR until you press Ctrl+L.

You should probably specify if you're talking about shaping or alignment. This may help developers identify the problem.

@MahdiGMK
Copy link

MahdiGMK commented Oct 5, 2024

after some scrolling, this would be the formatting :
image
formatting after refresh or Ctrl+L :
image

@MahdiGMK
Copy link

Other than that, the cursor position is so confusing in Neovim—it shows you visually somewhere, but it's physically elsewhere.
image
after some random input.
image

@avidseeker
Copy link

it shows you visually somewhere, but it's physically elsewhere

This is a Vim issue. Unrelated to the terminal.

See: vim/vim#14115

@MahdiGMK
Copy link

Vim doesn't support Bidi and relies upon the terminal for this behavior. In the Gnome terminal, it is somewhat better—at least it shows the cursor where it really is.

@yarons
Copy link

yarons commented Oct 20, 2024

There's a very good document about RTL challenges in terms of terminal display.
https://terminal-wg.pages.freedesktop.org/bidi/bidi-intro/think-rtl.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests