Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bidi support and address UAX31/UTS55 requirements #884

Merged
merged 28 commits into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
82fcef3
Add bidi support and address UAX31/UTS55 requirements
aphillips Sep 11, 2024
c5baba6
Update syntax.md including text from previous PR
aphillips Sep 11, 2024
ca63819
Repair the guidance on strongly directional marks
aphillips Sep 11, 2024
1e172fd
Fix formatting of the "important"
aphillips Sep 11, 2024
afd5ef0
Add bidi characters to description of whitespace.
aphillips Sep 11, 2024
c7a41fc
Permit bidi in a few more places
aphillips Sep 11, 2024
b0cd0a5
Update syntax.md ABNF
aphillips Sep 11, 2024
cacc5e9
Update formatting.md
aphillips Sep 11, 2024
1fb0f92
Address comment about name/identifier
aphillips Sep 11, 2024
a79fb8d
Address comments related to bidi in `name`
aphillips Sep 11, 2024
86a20f8
Fix variable's location
aphillips Sep 11, 2024
768a8a8
Address comment about the list of LRI/PDI targets
aphillips Sep 11, 2024
fd9fc57
One character typo :-P
aphillips Sep 11, 2024
734ef49
Update spec/syntax.md
aphillips Sep 12, 2024
4541758
Address comments about rule R3a-1
aphillips Sep 12, 2024
d751181
Update spec/syntax.md
aphillips Sep 12, 2024
cbd0457
Address comment about U+061C
aphillips Sep 12, 2024
0df963e
Change [o]wsp => `o` or `s`
aphillips Sep 12, 2024
be8fa43
Match syntax spec to abnf
aphillips Sep 12, 2024
f110af7
Remove *
aphillips Sep 12, 2024
d8c6d0f
Update syntax.md
aphillips Sep 12, 2024
d5fb3bb
Update spec/syntax.md
aphillips Sep 12, 2024
82af41f
Update spec/message.abnf
aphillips Sep 12, 2024
d9d79bc
Update spec/message.abnf
aphillips Sep 12, 2024
7858961
Update syntax.md
aphillips Sep 12, 2024
e7aa24c
Update spec/message.abnf
aphillips Sep 12, 2024
86fc1d4
Update spec/syntax.md
aphillips Sep 12, 2024
d5303c2
Update spec/syntax.md
aphillips Sep 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion spec/formatting.md
Original file line number Diff line number Diff line change
Expand Up @@ -768,7 +768,16 @@ That is, the text can can consist of a mixture of left-to-right and right-to-lef
The display of bidirectional text is defined by the
[Unicode Bidirectional Algorithm](http://www.unicode.org/reports/tr9/) [UAX9].

The directionality of the message as a whole is provided by the _formatting context_.
The directionality of the formatted _message_ as a whole is provided by the _formatting context_.

> [!NOTE]
> Keep in mind the difference between the formatted output of a _message_,
> which is the topic of this section,
> and the syntax of _message_ prior to formatting.
> The processing of a _message_ depends on the logical sequence of Unicode code points,
> not on the presentation of the _message_.
> Affordances to allow users appropriate control over the appearance of the
> _message_'s syntax have been provided.

When a _message_ is formatted, _placeholders_ are replaced
with their formatted representation.
Expand Down
6 changes: 3 additions & 3 deletions spec/message.abnf
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,13 @@ match = %s".match"

; Names and identifiers
; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName
; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD
; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD and U+061C
identifier = [namespace ":"] name
namespace = name
name = name-start *name-char
name = [bidi] name-start *name-char [bidi]
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
/ %x370-37D / %x37F-1FFF / %x200C-200D
/ %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "."
Expand Down
22 changes: 14 additions & 8 deletions spec/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -733,7 +733,13 @@ A **_<dfn>name</dfn>_** is a character sequence used in an _identifier_
or as the name for a _variable_
or the value of an _unquoted literal_.

_Variable_ names are prefixed with `$`.
A _name_ can be preceded or followed by bidirectional marks or isolating controls
to aid in presenting names that contain right-to-left or neutral characters.
These characters are **not** part of the _name_ and MUST be treated as if they were not present
aphillips marked this conversation as resolved.
Show resolved Hide resolved
when matching _name_ or _identifier_ strings or _unquoted literal_ values.
Implementations MAY remove these characters from a _message_.
aphillips marked this conversation as resolved.
Show resolved Hide resolved

_Variable_ _names_ are prefixed with `$`.

Valid content for _names_ is based on <cite>Namespaces in XML 1.0</cite>'s
[NCName](https://www.w3.org/TR/xml-names/#NT-NCName).
Expand Down Expand Up @@ -773,10 +779,10 @@ option = identifier owsp "=" owsp (literal / variable)

identifier = [namespace ":"] name
namespace = name
name = name-start *name-char
name = [bidi] name-start *name-char [bidi]
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
/ %x370-37D / %x37F-1FFF / %x200C-200D
/ %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "."
Expand Down Expand Up @@ -834,14 +840,14 @@ _option values_, and _keys_)
_Messages_ that contain right-to-left (aka RTL) characters SHOULD use one of the
following mechanisms to make messages display intelligibly in plain-text editors:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before getting much deeper into discussing the mechanisms, could you clarify if this is intended to be the canonical/recommended way of isolating or marking messages with RTL contents that we've discussed, or is that something that'll be provided separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's separate.


1. Use paired isolating bidi controls `U+2066 LEFT-TO-RIGHT ISOLATE`
and `U+2069 POP DIRECTIONAL ISOLATE` as permitted by the ABNF around
1. Use paired isolating bidi controls `U+2066 LEFT-TO-RIGHT ISOLATE` ("LRI")
and `U+2069 POP DIRECTIONAL ISOLATE` ("PDI") as permitted by the ABNF around
parts of any _message_ containing RTL characters:
- _inside_ of _placeholder_ markers `{` and `}`
- _outside_ _quoted-pattern_ markers `{{` and `}}`
- _identifiers_
- _literals_ (This is especially important for individual _keys_ in a _variant_)
- _option_ values
- _outside_ of _literals_, paying particular attention to _keys_ in a _variant_
- _outside_ of _variable_, _function_, _markup_, or _attribute_ _names_/_identifiers_,
including the identifying sigil (e.g. `<LRI>$var</PDI>` or `<LRI>:ns:name</PDI>`)
aphillips marked this conversation as resolved.
Show resolved Hide resolved
2. Use the 'local-effect' bidi marks
`U+061C ARABIC LETTER MARK`, `U+200E LEFT-TO-RIGHT MARK` or
`U+200F RIGHT-TO-LEFT MARK` as permitted by the ABNF before or after _identifiers_,
Expand Down