-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DESIGN] Update bidi design document to show proposed design #871
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allow for the
bidi
after:
if we're not allowing it to show up elsewhere after a starting sigil?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your example
$_مصر
can be made to render correctly using LRI/PDI around the identifier (which the syntax allows and the strict form encourages). The[bidi]
production allows starting or ending strongly directional marks to prevent spillover in the middle of a namespaced identifier, e.g.$_م1صر:_م2صر
I'll admit that it's generally better practice to put the strongly directional character at the end (before the
:
separator), but I didn't want to make the syntax ultra-fussy: whatever stew of strongly directional characters and a colon are not part of either thenamespace
or thename
.The separator is different from sigils because the sigils are all at token-start, whereas the namespace separator is embedded into the word-token.
code points for the first example:
\u2066$_\u0645\u0635\u0631\u200e\u2069
try itcode points for the second example:
\u2066$_\u06451\u0635\u0631:\u200e_\u06452\u0635\u0631\u200e\u2069
try itThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's rendering for me with the
_
next to the$
, which is wrong? The_
here has neutral directionality, and it's the first character of a name which as a whole is RTL. So we ought to have some way to render it as the right-most character, but that's not possible without some bidi control after the$
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, although a display like
$_مصر
is in itself super-weird (and it's namespaced friend$_م1صر:_م2صر
is even weirder) from the point of view that it's an RTL isolated run inside an LTR isolated run (or, in the latter case, two RTL runs inside an LTR run). This makes the name tokens display correctly RTL while still forcing the overall token (including the sigil) to display LTR. That is, this sequence:... with 6 of the 18 code points being bidi isolates.
There's a similar problem with options, where we're trying to force them to be LTR-ordered (
option = value
/م1صر=م2صر
) when RTL really really wants that to be displayed asم1صر=م2صر
.So you're right: my proposal does compromise some aspects of RTL display as part of our insistence that messages are intended to work in an LTR editing environment with LTR syntax. Either way, logical order is all that matters and users should be careful about using mixed direction for non-literal values. We give them the tools to make their bidi literals and patterns work RTL-normally as well as the tools to let non-RTL speakers read placeholders LTR (like most developers debugging messages). It is not perfect. Do we need that last bit of cruft to allow the sigil and identifier to be separate? (Not asking rhetorically, btw. What do people think? Where can we get developer/translator feedback?)
Note: It took a minute of fiddling to get the namespaced example to not get entangled with the markdown in this comment--all in service of getting the underscore on the right side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposed solution solves the 99% case, and trying to solve the remaining 1% will be tricky. Moreover, the people who really have to deal with messages are the translators, and they will need tooling to effectively do their jobs at scale — and that tooling has far more freedom to deal with bidi issues than we have in plain text. So I don't think we need to go any further down this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know. So do all of the sigils (with some of them being enclosing punctuation).
I guess my argument would be that the sigil is, from a user perspective, "part of the name", e.g.
$foo
, not$
withfoo
being separate. In an RTL display context, the sigil, being a neutral, will naturally reverse sides to be at the start, (e.g.$_مصر
). For us to maintain an LTR bias while making only the name token isolated is a lot more work--for tools and users.The counterargument would be that
identifier
andname
are their own things (and many are generated from the user's environment). The various sigils and such in MF2 are thus "not part of the name" and should be treated separately. Allowing users to insert these characters is not the same as obligating them to use them and I could see us making the affordance.In the screenshot below, the display is set to RTL. The placeholders have an LRI/PDI inside the
{
/}
.name
and LRI/PDI around the sigil+identifier.Here's the data used to produce the screenshot. Note that I do use an U+200E next to the
:
separator.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be my position. In particular for variables, AFAIK all current implementations would require parameters like
{ foo: 42 }
in order to resolve a$foo
, and not e.g.{ $foo: 42 }
.In case it matters, here's a slightly shorter way to get the same results as in the last example (The LR isolation around the sigil+identifier isn't required as the placeholder already isolates its contents, and the LRM next to the
:
has no effect):There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your formulation is correct. Note that the LRM is necessary if we don't permit isolates inside the
identifier
production (which is not what you are proposing), because the:
is a neutral and wants to extend whatever run is to either side of it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something got damaged somewhere in the several copy/paste operations.
To make things readable I replaced:
And the resulting strings above are:
Taking out the bidi control characters we are left with
That is not valid MF2 syntax.
So I don't know what this is trying to fix.
But yesterday I played a bit and put together a web page that one can use to interactively play with this.
https://mihai-nita.net/tmp/mf2bidi.html
And I argue that:
$
and the name proper, even when there is an_
there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um... how is it not valid?
$_A1BC
is a valid operand.:_A2BC
is a valid namespace.:_A3BC
is a valid function name._A4BC
is a valid option name._A5BC
is a valid unquoted literal.It's even a valid message, in that a simple message might consist solely of a placeholder.
What am I missing?