-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update design doc for message pattern quoting #503
Update design doc for message pattern quoting #503
Conversation
…of whitespace handling rules for template languages
…y vs. importance vs. i18n best practices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Thank you both for working on this (especially during "maximum crunch time". I've made a number of suggestions below. Please have a look. They are mostly editorial in nature.
exploration/text-vs-code.md
Outdated
Rarely do messages that need to include leading or trailing whitespace do so due to | ||
how they will be concatenated with other text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change made the text far less clear.
To be honest, I would have guessed that you would have replaced the somewhat biased word "Rarely" with something more neutral such as "Some messages need..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- After the resource file gets parsed as XML, the Android string resource format requiring
- After the resource file gets parsed as XML, the Android resource compiler requires
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhat biased word "Rarely" with something more neutral such as "Some messages need..."
In all fairness, 0.3% is indeed rare.
True, that number comes from an HTML oriented corpus, but I don't have access to much code using Windows / MacOS native formats.
exploration/text-vs-code.md
Outdated
Also importantly, we cannot make assumptions about the validity of leading or trailing whitespace in a message, | ||
especially since their usage may be entirely unrelated to internationalization issues (ex: sentence agreement disruption by concatenation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem clear to me?
I think what you might be trying to say is:
Also importantly, we cannot make assumptions about the validity of leading or trailing whitespace in a message, | |
especially since their usage may be entirely unrelated to internationalization issues (ex: sentence agreement disruption by concatenation). | |
Also, importantly, whether the intentional inclusion of whitespace by a | |
message author might be considered "desirable" or might be interpreted | |
as "an internationalization bug", | |
we need to provide the ability of an author to control the content of a given pattern without ambiguity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this whole point a bit. Something along these lines:
All common OSes (Windows, MacOS, Linux, iOS, Android) have "plain text" widgets and "rich formatting widgets" (usually Web).
The "Web widgets" usually drag with them a whole HTML engine. That is slow, and memory consumming.
So the most commonly used widgets are plain text.
And when that is all you have, spaces and newlines are used to create "fake" formatting.
Things like paragraphs, indents, lists (bulleted or numeric).
Some examples (pick and choose):
- https://petri-media.s3.amazonaws.com/2021/05/Figure10-3.png
- https://docs.oracle.com/cd/E19957-01/817-4220/images/SetupWizWelcome2.gif
- https://www.manageengine.com/products/support-center/help/installationguide/images/installwizard.jpg
Even in HTML there are sometimes reasons to force the space preserving.
TLDR: trailing spaces are not necessarily an i18n bug, so it is not the job of MF2 to discourage them, or to get in the way.
Messages themselves are "simple strings" and must be considered to be a single | ||
line of text. In many containing formats, newlines will be represented as the local | ||
equivalent of `\n`. | ||
Messages themselves are "simple strings" and must be considered to be WYSIWYG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect and the change greatly reduces the impact of the document IMO. The thing that is WYSIWYG is the pattern. In the case of simple messages, this is the whole message. But in the case of complex messages, what you see ({#input $foo :number minimumFractionDigits=11}
) is not exactly what you get 😉
The most important part of the original statement here is removed, which reminds readers that this message:
myMessage = {{
{#input $var :number}
{{You have {$var} message(s)}}
}}
Is actually this message in many storage formats:
myMessage = {{\n {#input $var :number}\n {{You have {$var} message(s)}}\n}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But the storage format might remove spaces / newlines, and not only from the beginning
For example I can do this in properties file:
myMessage = {{\
input {$var :number}\
{{You have {$var} message(s)}}\
}}
What MF2 sees (once loaded from the file) is a single line, with leading spaces trimmed (from each line) and newlines removed:
{{input {$var :number}{{You have {$var} message(s)}}}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like there are 2 points here, and that we can and want them both:
- patterns are WYSIWYG and have no restrictions on newline or most other characters within
- messages are treated as just a string ("simple strings") in the containing format
Our text for point 1 needs to be corrected to say "pattern" in cases where it incorrectly said "message".
We got to writing what we wrote because of the incorrect detail in the original text that said that messages are represented as "must be considered to be a single line of text". That phrase should not be preserved.
Messages themselves are "simple strings" and must be considered to be WYSIWYG. | ||
The WYSIWYG nature of representing a message pattern is independent of whether the message is a single line or contains multiple lines. | ||
|
||
There is no restriction that a message must only contain a single line (that is, not contain any newline characters), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would s/message/pattern/g this section (carefully because my suggestion is not always true). Only talk about the message when you intend to include the code.
exploration/text-vs-code.md
Outdated
when there is 1+ declarations in a `match` (selection) message, | ||
or when there are 2+ declarations in a non-`match` complex message. | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding that the message closing pattern characters add no value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About closing brackets, pros and cons:
- Closing brackets are ingrained in developers.
{{ something something
feels broken because of the missing}}
- The closing brackets might assist in some storage formats (maybe to be designed), especially some that might be minimized by tools.
Example:
msg1 = {{ some complex message }}
when = Type your name here.
Minimized: msg1={{ some complex message }}when=Type your name here.
- One can use the space after closing to add comments, metadata for linters or other tools.
{{
... when ...
}} lint_rules: { maxlen:"80 chars" } ref : { screenshot: "https://example.com/foo.jpg", glossaryId: 1234 }
Not strong arguments. But there is some value.
Even if all it does is prevent the "what the heck is this" reaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addition to the pros (new bullet of changing an existing one): visibility (?)
"No trimming / Always delimit" also makes it clear what spaces are rendered.
Example:
{#when one} This is a message (when one) condition
vs
{{
when one { This is a message (when one) condition }
}}
In the second case it is clear what is rendered: leading spaces, no matter what kind.
In the first case it is not clear.
"We trim the ASCII spaces" rule does not help, since the spaces might be non-breaking spaces, or em-space, en-space, ideographic space, and all the other characters that look like space on screen, but are not ASCII space.
So visually I don't know where the message starts / stops.
Even in edit mode (when I translate) I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit worse: as a translator you often get strings with no context.
There is no way to know if through what kind of API the string will go.
So if I get this string "Bill Gates did something" and I have to translate, and I want to put the "honorific space" in front of the name, I don't know how to do it.
Is the message going through MF2 with trimming? Then I have to wrap it in { Bill ....}
But maybe it is not going through MF2.
Then if I wrap it the {
... }
will render on screen "as is", not what you want.
We've seen this:
"The bee" => "L'abbeile" : if the message goes through MF1, the apostrophe needs to be escaped
"I'm 1 in 100" => "Eu sunt 1%" : if the string goes through a printf-like API then I need to escape the %
And as a translator I have no why to know what API the dev uses, and I'm not familiar with the escape rules for myriads of APIs.
I am in fact faced with more APIs than a developer.
A dev might do "java + html + js".
A translator often works for many projects, from many companies, so it is exposed to strings consumed by native Windows apps, PHP, some Ruby stuff, C#, SQL, others. And even switch several times per day.
These days you don't pay the bills as a translator if you only handle one single format for one produce from one company.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @mihnita 's point about the possibility "to add comments, metadata for linters or other tools."
We can discuss as a workgroup whether we leave the space after the closing delimiter for as a free-for-all, or we reserve it for us as a standard for future extensions.
Note: we are losing the possibility to do the same for simple messages because we are moving away from our current syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you said this wasn't ready for reviews yet, so please consider this my early feedback on an early draft, which I'm sharing now because the day here is soon over :)
|
||
—Rico Mariani, MS Research MindSwap Oct 2003. (<a href="https://learn.microsoft.com/en-us/archive/blogs/brada/the-pit-of-success">restated by Brad Adams</a>, MS CLR and .Net team cofounder) | ||
</blockquote> | ||
</details> | ||
|
||
Developers and translators should be able to read and write the syntax easily in a text editor. | ||
|
||
Translators (and their tools) are not software engineers, so we want our syntax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to prioritize the requirements. For example, while I agree that we should, in general terms, make the syntax simple and robust, I would also suggest that the primary consumer of the message syntax are developers. Translators will oftentimes work with just the pattern syntax, through CAT tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. See the "Evaluation" portion of the "Proposed Design" section below.
However, prioritizing requirements to me is tantamount to defining our value system to evaluate. We don't yet have alignment as a group on what our requirements are, let alone our prioritization of those requirements (value system). This is another reason why I wanted to keep the values (prioritization of requirements) only alongside the area explaining how we chose to propose an option via evaluation.
Before we can do anything further towards implementing your suggestion, you/we have to get the group to be self-aware and precise on their values, and then maybe after that, alignment. :-)
exploration/text-vs-code.md
Outdated
Within a complex message, patterns are always quoted with `{{...}}` or other choice of delimiter. | ||
|
||
The entire complex message is also wrapped with `{{...}}` or other choice of delimiter. | ||
This allows interior "code mode" of message to have flexible whitespace in between tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think should happen to whitespace outside the entire code block? Should we specify it here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think it should be specified, it not already.
(will probable be covered anyway when we get to update the ebnf)
My choice would be to say:
-
nothing before
{#
. If there is something, then we are in simple text mode, and{#
will be an error -
after closing
#}
we have several options:
- no closing, we drop it as a requirement
- closing is optional. If as a developer you are bothered to see unclosed brackets, feel free to close it. Does nothing
- closing is mandatory, and the message ends there
- nothing allowed after is => unnecessarily rigid?
- allowed only spaces / newlines after it
- reserve it for us. We can extend the standard later to add comments, lint directives, links to images, etc
- allow for developers to do what they want, "free for all". The message ended, we ignore the rest.
They can add comments, lint directives, links to images, whatever
On the WG to discuss and decide.
I like the idea to reserve it for us.
It is a non-breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to initially say anything outside the complex message-wide delimiters is an error (invalid). In the future, if we want to relax requirements and say annotations & message description notes are allowed in message syntax, you have the freedom to do so (after the closing delimiter).
In general with API design, you can always relax requirements and narrow outputs, but the reverse is not possible (causes breaking changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is anything "outside" the complex message-wide delimiters, then:
- It's whitespace that we trim
a. Or produces an error (worth discussion) - or the message is actually a simple message that produces a lot of errors
This {{ match {$var} when * {{{$var}}}}} is an interesting message.
This evaluates, I think, as:
This {�} is an interesting message.
Or possibly (with $var
==123) as:
This {�}{�} match 123 when * 123{�}{�} is an interesting message.
(both emit a syntax error)
exploration/text-vs-code.md
Outdated
* The rule about the whether leading and trailing whitespace is included is simple and unambiguous. | ||
* This matches the WYSIWIG behavior that simple messages preserve. | ||
* The patterns can be detected within the pattern more easily due to the delimiters serving as a visual anchor. | ||
* Requiring all patterns to be quoted minimizes the number of characters that need to be escaped within a pattern to 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also holds for at least some of the other proposals.
In fact, if we always quote variant patterns, we must make the closing delimiter special. Theoretically, in other proposals, we could only special-case the opening delimiter, because both code and placeholders would be wrapped in the same delimiter. (Although that would require agreeing to a different way of preserving whitespace than the currently agreed {{ ... }}
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, if we always quote variant patterns, we must make the closing delimiter special.
But we have to do it for all the other proposals too.
Because all allow for optional wrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also holds for at least some of the other proposals.
You're right, it's true for 1a and 2a. Not for 3a. Since it's not a universal aspect of the proposals, it's not a wash (redundant), and thus a point worth mentioning.
I'm open to suggestions for rephrasing "Requiring all patterns" to whatever it is that results in the minimal possible of characters needing escaping.
exploration/text-vs-code.md
Outdated
while complex messages use the aforementioned delimiter to quote patterns (ex: `{{...}}`). | ||
* Another potential drawback, specifically in the case of non-`match` complex messages with exactly 1 declaration, | ||
is that this option adds 2 extra delimiters compared to an alternative syntax that doesn't require quoted patterns | ||
and is designed to minimize delimiter usage only to code mode introducers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add some of the other previously discussed drawbacks:
- If the code block is delimited from both sides, users may be tempted to insert text around it.
- If the code block is delimited from both sides, it may be easy to forget the
}}
closing the entire block. - If we use curlies for patterns and for placeholders, then they serve double duty, which may make the syntax harder to understand, and also harder to make the pattern out visually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding text after closing is not necessarily a drawback (I have a comment somewhere)
If the code block is delimited from both sides, it may be easy to forget the }} closing the entire block.
Fair enough.
Maybe make it optional?
I think that {{
is to make this intentionally ugly. I would probably go with {#
to enter code mode, and {
to wrap the patterns.
And yes, it is double douty. But choosing anything else means that we escape one extra thing.
If we use <<<
and I want (for some reason <<< Hello world
as simple text, then we need to escape it.
So we add another escaping rule.
Pros and cons :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the code block is delimited from both sides, users may be tempted to insert text around it.
Not a moving argument to me. As you pointed out in a different comment, the audience of message authoring is developers. If we say it's not possible to add extraneous text, and they do so and the MF2 implementation rejects the message, they'll figure it out quickly.
- If the code block is delimited from both sides, it may be easy to forget the
}}
closing the entire block.
Forgetting to type syntax can happen in any alternative option, so this feels like a weaker argument than the previous. Also, even though developers learn instincts early on to always balance delimiters, we create linters & other tooling to double-check.
(Some languages that use delimiters in a simple and regular way have the ability to evolve powerful tools that always keep everything balanced while being easy to use -> easy to do the right thing, impossible to do the wrong thing. But I digress...)
- If we use curlies for patterns and for placeholders, then they serve double duty, which may make the syntax harder to understand, and also harder to make the pattern out visually.
Sure. Option 3a solves for that with the tradeoff of taking on other costs as a result. The next question then becomes how do we prioritize our requirements in order to create a value system that we use to evaluate the tradeoffs (choose)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the code block is delimited from both sides, it may be easy to forget the }} closing the entire block.
Forgetting to type syntax can happen in any alternative option, so this feels like a weaker argument than the previous.
I disagree a bit. Any enclosing syntax is prone to errors, but the argument here is that the more levels of nesting (and the further apart the enclosure bits), the more opportunities for error exist because the user is keeping track of more things.
Note that one of the proposals for "2a" was to have just a starter sigil. This would eliminate the enclosure for the message.
exploration/text-vs-code.md
Outdated
|
||
Cons: | ||
|
||
* This comes at the cost of an inconsistency in the WYSIWYG patterns are quoted between simple and complex messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss here the risk of "two syntaxes in a trench coat." For someone who's only ever seen simple messages, the only syntax rule they can infer is that {}
is used for placeholders. A whole separate complex-message mode cannot be easily "guessed".
(But see also my comment about considering developers the primary audience for the complex message syntax, so perhaps this is acceptable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"two syntaxes in a trench coat."
The expression is designed to sounds ugly :-)
But there are already templating systems doing this, and didn't prevent adoption, or trigger many complaints.
Heck, when I write HTML with stylesheets and code I have 3 syntaxes in trench coat :-)
Or if you write C/C++/C# something else, you have one syntax in the main code, another syntax in strings, and another in printf strings, etc.
int foo = 10%3; // syntax one, math, result is 1
puts("10%3"); // syntax 2, in string, output, result is "10%3"
printf("10%3"); // syntax 3, in string, error, need to double the `%%` to get "10%3" in output
exploration/text-vs-code.md
Outdated
our value system places to the requirements met by the pro aspects compared to the con aspects. Namely: | ||
|
||
* [high] Unsurprising WYSIWYG behavior from patterns | ||
* [high] Easy recognition of patterns, even for non-developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, I've never found ICU MF patterns easy to spot—specifically because they use {}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.
But I've never seen any questions asked for single selections.
(on StackOverflow or internally sites, here and in previous companies)
It gets ugly, even for developers, when you have multiple selections (plural-in-plural, select in plural in ???)
exploration/text-vs-code.md
Outdated
... | ||
<?php | ||
if (true) { | ||
echo '<p>Hello World</p>'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, but I think both PHP and Freemarker would actually prefer the second method:
<?php if (true): ?>
<p>Hello, world!</p>
<?php endif ?>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't even know that is possible.
If I check https://www.php.net/manual/en/control-structures.if.php and https://www.w3schools.com/php/php_if_else.asp the option is not even mentioned.
It is mentioned in the "User Contributed Notes" of the manual.
But I kind of doubt that something that is not even mentioned in the official manual is the preferred way.
If anything this makes the point that having more than one way to do things, some more recommended than others, is not a good thing.
exploration/text-vs-code.md
Outdated
Rarely do messages that need to include leading or trailing whitespace do so due to | ||
how they will be concatenated with other text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- After the resource file gets parsed as XML, the Android string resource format requiring
- After the resource file gets parsed as XML, the Android resource compiler requires
exploration/text-vs-code.md
Outdated
Rarely do messages that need to include leading or trailing whitespace do so due to | ||
how they will be concatenated with other text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhat biased word "Rarely" with something more neutral such as "Some messages need..."
In all fairness, 0.3% is indeed rare.
True, that number comes from an HTML oriented corpus, but I don't have access to much code using Windows / MacOS native formats.
exploration/text-vs-code.md
Outdated
Also importantly, we cannot make assumptions about the validity of leading or trailing whitespace in a message, | ||
especially since their usage may be entirely unrelated to internationalization issues (ex: sentence agreement disruption by concatenation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this whole point a bit. Something along these lines:
All common OSes (Windows, MacOS, Linux, iOS, Android) have "plain text" widgets and "rich formatting widgets" (usually Web).
The "Web widgets" usually drag with them a whole HTML engine. That is slow, and memory consumming.
So the most commonly used widgets are plain text.
And when that is all you have, spaces and newlines are used to create "fake" formatting.
Things like paragraphs, indents, lists (bulleted or numeric).
Some examples (pick and choose):
- https://petri-media.s3.amazonaws.com/2021/05/Figure10-3.png
- https://docs.oracle.com/cd/E19957-01/817-4220/images/SetupWizWelcome2.gif
- https://www.manageengine.com/products/support-center/help/installationguide/images/installwizard.jpg
Even in HTML there are sometimes reasons to force the space preserving.
TLDR: trailing spaces are not necessarily an i18n bug, so it is not the job of MF2 to discourage them, or to get in the way.
Messages themselves are "simple strings" and must be considered to be a single | ||
line of text. In many containing formats, newlines will be represented as the local | ||
equivalent of `\n`. | ||
Messages themselves are "simple strings" and must be considered to be WYSIWYG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But the storage format might remove spaces / newlines, and not only from the beginning
For example I can do this in properties file:
myMessage = {{\
input {$var :number}\
{{You have {$var} message(s)}}\
}}
What MF2 sees (once loaded from the file) is a single line, with leading spaces trimmed (from each line) and newlines removed:
{{input {$var :number}{{You have {$var} message(s)}}}}
exploration/text-vs-code.md
Outdated
("Simple messages" refers to messages consisting solely of a pattern, and thus are not complex messages.) | ||
|
||
Because the simple message pattern consists of the entire message, | ||
the pattern includes any leading or trailing whitespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and newlines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already included.
s = 1*( SP / HTAB / CR / LF )
exploration/text-vs-code.md
Outdated
when there is 1+ declarations in a `match` (selection) message, | ||
or when there are 2+ declarations in a non-`match` complex message. | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About closing brackets, pros and cons:
- Closing brackets are ingrained in developers.
{{ something something
feels broken because of the missing}}
- The closing brackets might assist in some storage formats (maybe to be designed), especially some that might be minimized by tools.
Example:
msg1 = {{ some complex message }}
when = Type your name here.
Minimized: msg1={{ some complex message }}when=Type your name here.
- One can use the space after closing to add comments, metadata for linters or other tools.
{{
... when ...
}} lint_rules: { maxlen:"80 chars" } ref : { screenshot: "https://example.com/foo.jpg", glossaryId: 1234 }
Not strong arguments. But there is some value.
Even if all it does is prevent the "what the heck is this" reaction.
exploration/text-vs-code.md
Outdated
when there is 1+ declarations in a `match` (selection) message, | ||
or when there are 2+ declarations in a non-`match` complex message. | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addition to the pros (new bullet of changing an existing one): visibility (?)
"No trimming / Always delimit" also makes it clear what spaces are rendered.
Example:
{#when one} This is a message (when one) condition
vs
{{
when one { This is a message (when one) condition }
}}
In the second case it is clear what is rendered: leading spaces, no matter what kind.
In the first case it is not clear.
"We trim the ASCII spaces" rule does not help, since the spaces might be non-breaking spaces, or em-space, en-space, ideographic space, and all the other characters that look like space on screen, but are not ASCII space.
So visually I don't know where the message starts / stops.
Even in edit mode (when I translate) I don't know.
exploration/text-vs-code.md
Outdated
when there is 1+ declarations in a `match` (selection) message, | ||
or when there are 2+ declarations in a non-`match` complex message. | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit worse: as a translator you often get strings with no context.
There is no way to know if through what kind of API the string will go.
So if I get this string "Bill Gates did something" and I have to translate, and I want to put the "honorific space" in front of the name, I don't know how to do it.
Is the message going through MF2 with trimming? Then I have to wrap it in { Bill ....}
But maybe it is not going through MF2.
Then if I wrap it the {
... }
will render on screen "as is", not what you want.
We've seen this:
"The bee" => "L'abbeile" : if the message goes through MF1, the apostrophe needs to be escaped
"I'm 1 in 100" => "Eu sunt 1%" : if the string goes through a printf-like API then I need to escape the %
And as a translator I have no why to know what API the dev uses, and I'm not familiar with the escape rules for myriads of APIs.
I am in fact faced with more APIs than a developer.
A dev might do "java + html + js".
A translator often works for many projects, from many companies, so it is exposed to strings consumed by native Windows apps, PHP, some Ruby stuff, C#, SQL, others. And even switch several times per day.
These days you don't pay the bills as a translator if you only handle one single format for one produce from one company.
Co-authored-by: Addison Phillips <[email protected]>
exploration/text-vs-code.md
Outdated
Within a complex message, patterns are always quoted with `{{...}}` or other choice of delimiter. | ||
|
||
The entire complex message is also wrapped with `{{...}}` or other choice of delimiter. | ||
This allows interior "code mode" of message to have flexible whitespace in between tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to initially say anything outside the complex message-wide delimiters is an error (invalid). In the future, if we want to relax requirements and say annotations & message description notes are allowed in message syntax, you have the freedom to do so (after the closing delimiter).
In general with API design, you can always relax requirements and narrow outputs, but the reverse is not possible (causes breaking changes).
exploration/text-vs-code.md
Outdated
* The rule about the whether leading and trailing whitespace is included is simple and unambiguous. | ||
* This matches the WYSIWIG behavior that simple messages preserve. | ||
* The patterns can be detected within the pattern more easily due to the delimiters serving as a visual anchor. | ||
* Requiring all patterns to be quoted minimizes the number of characters that need to be escaped within a pattern to 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also holds for at least some of the other proposals.
You're right, it's true for 1a and 2a. Not for 3a. Since it's not a universal aspect of the proposals, it's not a wash (redundant), and thus a point worth mentioning.
I'm open to suggestions for rephrasing "Requiring all patterns" to whatever it is that results in the minimal possible of characters needing escaping.
exploration/text-vs-code.md
Outdated
when there is 1+ declarations in a `match` (selection) message, | ||
or when there are 2+ declarations in a non-`match` complex message. | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @mihnita 's point about the possibility "to add comments, metadata for linters or other tools."
We can discuss as a workgroup whether we leave the space after the closing delimiter for as a free-for-all, or we reserve it for us as a standard for future extensions.
Note: we are losing the possibility to do the same for simple messages because we are moving away from our current syntax.
exploration/text-vs-code.md
Outdated
while complex messages use the aforementioned delimiter to quote patterns (ex: `{{...}}`). | ||
* Another potential drawback, specifically in the case of non-`match` complex messages with exactly 1 declaration, | ||
is that this option adds 2 extra delimiters compared to an alternative syntax that doesn't require quoted patterns | ||
and is designed to minimize delimiter usage only to code mode introducers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the code block is delimited from both sides, users may be tempted to insert text around it.
Not a moving argument to me. As you pointed out in a different comment, the audience of message authoring is developers. If we say it's not possible to add extraneous text, and they do so and the MF2 implementation rejects the message, they'll figure it out quickly.
- If the code block is delimited from both sides, it may be easy to forget the
}}
closing the entire block.
Forgetting to type syntax can happen in any alternative option, so this feels like a weaker argument than the previous. Also, even though developers learn instincts early on to always balance delimiters, we create linters & other tooling to double-check.
(Some languages that use delimiters in a simple and regular way have the ability to evolve powerful tools that always keep everything balanced while being easy to use -> easy to do the right thing, impossible to do the wrong thing. But I digress...)
- If we use curlies for patterns and for placeholders, then they serve double duty, which may make the syntax harder to understand, and also harder to make the pattern out visually.
Sure. Option 3a solves for that with the tradeoff of taking on other costs as a result. The next question then becomes how do we prioritize our requirements in order to create a value system that we use to evaluate the tradeoffs (choose)?
On my part, I've spent much more time than I'd wish over the past few days and weeks thinking about and looking into messages with external whitespace. In the absence of any better place to put down some of my thoughts on this, these are the aspects and arguments that I find important to account for: Localizable external whitespace is really rare, while mistakes are commonSometimes a leading or trailing space could be localizable, but you can't necessarily tell without looking through the code which is using the message. So I did that. This is what I'd mentioned previously via email:
As a next step, I filtered all of the above to the 41 potentially localizable strings which are currently in production (looks like a sentence, has one leading or trailing space), and found where they're coming from in code, and how they're used: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40. Of the above, only the 0th message actually contains a localizable space; the other 40 are all bugs. So that's exactly one localizable external space in about 66k messages currently in production. This space was incorrectly dropped in 15 of the 35 locales to which it's translated; I've now submitted corrections for all of them. Real localizable external whitespace is so rare that it gets drowned by the noise. All of the above bugs are in formats that explicitly delimit patterns, and thereby make it too easy to include leading or trailing whitespace. About 28% of the messages currently in production use Fluent, which does not quote patterns. None of these were similarly buggy. From this I would conclude that using a syntax which requires external whitespace to be explicitly intentional would make it much less likely for it to be ignored, and would lead to better localizations. We're not actually talking about quoting patternsTo be precise, we are talking about delimiting patterns with {braces}, which rather explicitly are not 'quotes' or "quotes". The distinction here matters, because we are looking to assign a novel meaning to a pair of characters that no other syntax than ICU MessageFormat uses to delimit localizable text. Every other syntax which uses braces in text uses them to delimit code. Which MF2 is also doing. When we talk of the syntax being "WYSIWYG", we are asking for its readers to not see the {braces}, a symbol so prevalent in our syntax that we've jokingly incorporated it into our logo {�}. Humans are not trained for that the way they are with "quotes", or with empty spaces acting as content separators. In other words, if I see the braces but I don't see the empty space, how is the syntax "WYSIWYG"? We are also asking for MF2 authors and editors to somehow know that the spaces within the { braces } are significant -- but only if they're delimiting patterns. Within expressions, MF2 syntax ignores whitespace, so The overlap of external whitespace and variants is really truly tinyIn a very real sense, the discussion of whether messages with variants should always be delimited is a discussion asking if this string is ok being represented as
or if that is sufficiently problematic that we need every pattern of every message with variants to be {{delimited}}. I have been actively seeking for examples of messages with leading or trailing whitespace for the last month, and the above is the one actual, current message with variants and external whitespace that has been identified. We are talking about such a rare situation it should not be driving our whole syntax. We should choose to do with patterns what we're doing with literals, where for common values we allow them to be delimited by whitespace, but also permit |
@eemeli Thanks for the long comment.
I agree that we are talking about delimiting patterns and that this is what our technical decision is about. Quoting would be one mechanism for delimiting patterns, but is not the only one. I would call out that quoting does not require the use of "quote" characters. I think that referring to
I think this isn't quite on the nose. The real problem we're dealing with here is intentionality. There has to be a way for users to intentionally include various kinds of character sequence into their pattern. This includes invisible Unicode whitespace that is not MF2 whitespace. For example, non-breaking space or NNBSP or ZWNJ or what have you. There are a lot of characters that have no "ink" but which a user might intend to be part of the message. Some of those characters will be MF2 whitespace. MF2 intentionally does not include a general purpose character escaping mechanism (because we expect the host environment or file format to include one and we are avoiding the double-escaping mess). If the boundary between "pattern" and "not pattern" is between invisible characters, that's pretty difficult to work with. I think there are four audiences that need to be served:
What I think is interesting is that pattern delimiters are probably syntax. If pattern delimiters are optional, it might be unclear to translators whether a given pattern is already quoted (delimited) or to tools as to whether delimiters are needed. It's hard for machines to guess people's intentions. I agree that PEWS is rare, which is why we need to be especially clear about how to handle the rare cases. |
I don't want to insist on pedantry, as long as we recognise that the pattern "quote" characters we're considering are rather explicitly not generally used as quote characters. For anyone coming to MF2 not via MF1, this will be an additional weird thing to learn. Very much comparable to
Yes, and this will be supported no matter which way we decide to go, by the optional |
Ready for review now. I made edits mostly based on #504, with some additions, and also responding comments that came prior to being ready anyways. Previous info from the doc (background, use cases, stats, examples) is kept pre-hidden at the bottom, like appendices. |
>{{ | ||
> match {$var} | ||
> {when *} This pattern has a space in front (it's between \} and This) | ||
> {when other} | ||
> This pattern has a newline and six spaces in front of it | ||
> {when moo}This pattern has no spaces in front of it, but an invisible space at the end | ||
>}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>{{ | |
> match {$var} | |
> {when *} This pattern has a space in front (it's between \} and This) | |
> {when other} | |
> This pattern has a newline and six spaces in front of it | |
> {when moo}This pattern has no spaces in front of it, but an invisible space at the end | |
>}} | |
>{match {$var}} | |
>{when *} This pattern has a space in front (it's between } and This) | |
>{when other} | |
> This pattern has a newline and six spaces in front of it | |
>{when moo}This pattern has no spaces in front of it, but an invisible space at the end |
|
||
Pros: | ||
- WYSIWYG (on steroids) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Avoids as many escape sequences as possible, | |
as `}` does not need escaping in patterns. | |
- Probably not a serious alternative: the example | ||
includes any number of obvious footguns that have to be addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really rather opinionated. In many ways, this is the same as the "Always quote" solution, except that the pattern delimiters are }…{
instead of {{…}}
. So the unnamed footguns probably apply to that alternative as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is opinionated. When I wrote it, I was being lazy by not enumerating the issues.
this is the same as the "Always quote" solution, except that the pattern delimiters are }…{ instead of {{…}}. So the unnamed footguns probably apply to that alternative as well.
No, this is incorrect.
{{...}}
encloses all and only the whitespace that is intentional in the pattern, with {{
and }}
forming the pattern boundary. These boundary characters are visible.
}...{
makes all whitespace in the variant block meaningful. It effectively prohibits a multiline representation of a message, because the newlines are always meaningful. It also means that trailing spaces (which are invisible) have meaning.
To make a message multiline, you have to put the whitespace inside the key.:
{{
match {$var}
{when 0
}This has no newline or space.{
when one}
This has a newline at the start.{
when *} This has a space at the start and six spaces and a newline at the end.
}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we take your example above, and after the match
replace each }
with {{
and each {
with }}
, we get this:
{{
match {$var}
}}when 0
{{This has no newline or space.}}
when one{{
This has a newline at the start.}}
when *{{ This has a space at the start and six spaces and a newline at the end.
}}
Ignoring the specifics of what's happening with the preamble, that seems pretty similar to me. It's just that we're conditioned to look at the {
and }
a certain way.
Cons: | ||
- Requires one of the alternate syntaxes | ||
- Has two ways to represent a pattern. | ||
- May be difficult for translators to add quotes when needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I've been able to determine, there are exactly three scenarios in which a translator may need to add leading or trailing spaces to a pattern that starts out without them:
- When translating a whole-sentence message from a CJK script to a non-CJK script, such that the sentences are concatenated into a single paragraph and need spaces between them. As with all other string concatenations, I would expect for this to be explicitly called out to the translator, so that they may know whether to add the space at the start or end of the pattern.
- When translating a pattern to Chinese which ends up requiring a leading honorific space. As far as I can tell, this is really rare in dynamic message strings.
- When the message is expected to be output using a monospace font and fakes either centering or right-alignment by using in-message spaces for indentation, and the first line of the pattern happens to be exactly the maximum length, and so does not need leading spaces. This is sufficiently rare that I'm pretty sure this is only a theoretical possibility, and in any case I'd expect it to be rather clearly called out to the translator.
Given that each of the above only has an impact on the pattern delimiting if the message also has multiple variants and if the translator is not using any tooling that'd take care of the delimiting and if the developer has not pre-emptively delimited the pattern, I would be ok accepting this negative, especially as the downside would be a single missing space in the translation.
- Easy to use (best of both worlds?) | ||
|
||
Cons: | ||
- Requires one of the alternate syntaxes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we just ran a "beauty contest" in which these "alternate syntaxes" were preferred over the current main
syntax by an absolute majority of the participants, this could also be listed as one of the "Pros".
Pros: | ||
- Code is special, whitespace is not. | ||
- Makes PEWS into a "special event", alerting developers to the non-I18N aspects of it? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Avoids as many escape sequences as possible, | |
as `}` does not need escaping in patterns. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs escaping if it is used as ending of the "wrapping" of the string
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Remember our previous discussions from last year about this. This MF2 group chose curly braces precisely because they are the least likely to occur in other syntaxes and in message patterns themselves. |
For the purposes of reading the document in our 2023-10-30 call, I'm merging this work now. This does not make the comments above right/wrong, relevant/irrelevant, or anything else. It's just to enable "easy reading". |
- Rename the design doc. - Cross out rejected options 2 and 5 - Add notes to 2 and 5 calling this out (other changes may be added from the previous thread in #503 and WG call notes from 2023-10-30)
* Prepare design doc ahead of balloting - Rename the design doc. - Cross out rejected options 2 and 5 - Add notes to 2 and 5 calling this out (other changes may be added from the previous thread in #503 and WG call notes from 2023-10-30) * Prepare balloting instructions * Update exploration/delimiting-variant-patterns.md Co-authored-by: Eemeli Aro <[email protected]> * Apply suggestions from code review Co-authored-by: Eemeli Aro <[email protected]> --------- Co-authored-by: Eemeli Aro <[email protected]>
An update to the design doc, specifically on the topic of: "Do we allow unquoted variant patterns?"
Very much a WIP while in draft mode (not suitable for "drive-by reviews" until officially ready for review).
@mihnita please take an initial look and provide suggestions.