Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we really be using {{pattern}} and |literal| delimiters? #602

Open
eemeli opened this issue Jan 15, 2024 · 7 comments
Open

Should we really be using {{pattern}} and |literal| delimiters? #602

eemeli opened this issue Jan 15, 2024 · 7 comments
Labels
LDML 47 LDML 47 Release (Stable) question Further information is requested Seek-Feedback-in-Preview Issue should be something we seek feedback on in the tech preview period syntax Issues related with MF Syntax

Comments

@eemeli
Copy link
Collaborator

eemeli commented Jan 15, 2024

The syntax of MessageFormat 2 is the result of a long chain of discussions, arguments, compromises, and the balancing of multiple different stakeholders and concerns. While it is quite capable of fulfilling the demands put upon it, it is literally a design by committee.

While I strongly support our work and our results, I remain concerned that the design decisions we've made specifically about our {{pattern}} and |literal| delimiters, and how weird they are. We have, quite explicitly, ended up choosing string delimiters that are not commonly used as string delimiters, so that embedding MF2 strings within programming languages or JSON does not require internal escapes, and to reduce the frequency of message contents needing to include escapes.

To rationalise our decisions, we have multiple overlapping design documents tracing our path to where we are now; documents that we've argued about and sometimes voted on to unblock our progress. As far as I know, we do not have a single succinct document explaining why these delimiters are the way they are.

As we are now approaching a complete definition of the language and publishing it as a tech preview, I think the delimiters are a specific concern that we ought to be ready to accept some criticism about, and to potentially reconsider for our final release. The base assumptions that I believe we may have mis-estimated include:

  • How common it is to include MF2 messages in a programming language or other context where specific delimiters are required of strings, and alternatives are not available. Pattern delimiters in particular are almost always only needed for strings for which a multi-line presentation is not necessary, but is useful. This reduces the frequency with which conflicts with e.g. " would arise. Many programming languages only support multi-line strings with delimiters like ` and """ that we could specifically avoid.
  • How difficult it is to manually escape string delimiter characters, when they do occur in MF2 source and are not avoidable by using a different string delimiter.
  • How much of an impediment to adoption using unusual delimiters might be. As we've discussed on multiple occasions, we do not expect for really anyone to become a "MessageFormat 2 developer". The result of our work is an auxiliary message formatting language that users will only deal with on occasion. Within that context, I think greater weight should be put on not deviating from contextual assumptions, such as "how to quote stuff".
  • When less technical users interact with MF2 source, what is the surrounding context in which this happens, and what restrictions does that format impose on their work? In other words, if e.g. MF2 messages are embedded in a .properties file that a translator is manually working with, that format does not impose any quoting requirements on message values. In that context, would the user be better served by more common delimiters that may need escaping when they occur within a message body, or by our current {{braces}} and |bars|?
  • What is the appropriate lesson to take from ICU MessageFormat's choices to use ' as an escape character, and to support multiple different "apostrophe modes"? Is it that the needs for escaping should be minimised, or that the rules and practices of escaping should be regularised? With MF2 we've clearly aimed for the former (e.g. limiting which characters may be \ escaped in pattern text vs. literal text), but is that really the only lesson to take here? Could we also consider choosing surprising syntax to be a source of potential errors that we ought to avoid?

Finally, to illustrate what this is all about, consider this MF2 message, using our current syntax:

.input {$count :number}
.local $kind = {|"Granny Smith"|}
.match $count
0 {{no {$kind} apples}}
one {{{$count} {$kind} apple}}
* {{{$count} {$kind} apples}}

If we were to allow for more normal pattern and literal delimiters, this same message could read as:

.input {$count :number}
.local $kind = {'"Granny Smith"'}
.match $count
0: "no {$kind} apples"
one: "{$count} {$kind} apple"
*: "{$count} {$kind} apples"

While I appreciate that the alternative syntax would carry some costs, I believe that its benefits in readability and lack of weirdness outweigh the negatives. Therefore, I ask that we be open to discussing these choices further during the tech review phase.

Edit: Updated syntax examples to match on variables, and include a trailing : after keys to allow for keys to also be 'quoted' or "quoted".

@eemeli eemeli added question Further information is requested syntax Issues related with MF Syntax Future Deferred for future standardization labels Jan 15, 2024
@aphillips
Copy link
Member

(chair hat on)

@eemeli was asked to file this issue following discussion in the 2024-01-15 teleconference. In that call, we explicitly discussed that this is out-of-scope for LDML45. The MFWG will not consider any further normative preferential changes to the ABNF or syntax in this release. Only editorial ("cleanup") or technical errors ("bugs") within the current design will be considered in this release.

This comment is strictly to document that fact. It is neither an endorsement nor a rejection of this issue.

@mihnita
Copy link
Collaborator

mihnita commented Jan 20, 2024

My take on this


How common it is to include MF2 messages in a programming language or other context where specific delimiters are required of strings, and alternatives are not available

Very often.
And it is not only about programming languages, but also file formats.

There are many formats that delimit their own messages with ", or require " to be escaped.

So 4 of the most common file formats explicitly designed for localization use ", with not alternative.


How difficult it is to manually escape string delimiter characters

Let's take this:

.match ($button :string)
subscribe {{Click "Subscribe" to stop receiving emails}}
unsubscribe {{Click "Subscribe" to ...}}

If we replace {{...}} with quotes in our syntax now this becomes

.match ($button :string)
subscribe "Click \"Subscribe\" to stop receiving emails"
unsubscribe "Click \"Subscribe\" to ..."

And next we store the message in code / json / etc:

{
"msg": ".match ($button :string) subscribe \"Click \\\"Subscribe\\\" to stop receiving emails\" unsubscribe \"Click \\\"Subscribe\\\" to ...\""
}

What is the appropriate lesson to take from ICU MessageFormat's choices to use ' as an escape character

That it is a bad idea to require escaping for characters commonly used in the body of localized messages, and that WYSIWYG is best.

@aphillips aphillips added LDML46.1 MF2.0 Draft Candidate and removed Future Deferred for future standardization labels Sep 10, 2024
@aphillips
Copy link
Member

I think this is now out of scope for 2.0. @eemeli okay to close?

@aphillips aphillips added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Oct 7, 2024
@bearfriend
Copy link
Contributor

bearfriend commented Oct 7, 2024

I think it's still worth discussing (in a new issue?) alternatives for a slightly different reason. In MF1, the use of braces to delimit multiple things has caused readability problems for both authors and translators. While tools obviously exist to parse messages, in my experience many people at all levels of the translation process will instead opt to use what is already known and/or available to them (usually a too-simplistic regex) to try to identify or block out sections of a message for unit tests or to hand off to a translator.

Example: /{(?<!\w+,\s?(plural|select|selectordinal),)[^{}]+?}/

{a, plural, one {message} other {{arg}}}

This results in both {message} and {arg} being matched. Frankly even this is higher effort than a lot of attempts. You can argue, as I would, that this shouldn't be done in the first place, but it does happen. I think it's worth considering a more distinct delimiter at least for patterns.

The $ for variables helps with this, but I can still imagine {{pattern}}, {{{$patternVar}}} and {$var} being mixed up in various ways by authors, translators, and tools trying to identify different parts of a message.

@eemeli
Copy link
Collaborator Author

eemeli commented Oct 8, 2024

I don't think this should be closed yet, as we have not heard answers from outside our WG to any of the questions I pose above. It could well be that these questions could be answered by the reviews and feedback we hope to get after locking down the syntax in November.

@eemeli eemeli removed resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. LDML46.1 MF2.0 Draft Candidate labels Oct 8, 2024
@macchiati
Copy link
Member

macchiati commented Oct 8, 2024 via email

@aphillips
Copy link
Member

My reasoning here is that this is not something we will change in 2.0. We have design docs for literals and pattern delimiters, the beauty contest and have had numerous discussions about it. @eemeli I know you're interested in revisiting this, but I'd rather close all of the issues pertaining to our release that are actually resolved in our release. Let the feedback process and external reviews revisit it as needed.

@bearfriend MF1's use of nested brackets is, indeed, egregious. MF2's use of brackets is much more manageable (three is the deepest nesting--the paired pattern quotes {{ and the {$placeholder} delimiter within them }}) I agree with your analysis, but, again, the group has resolved this in a specific way.

@aphillips aphillips added Seek-Feedback-in-Preview Issue should be something we seek feedback on in the tech preview period LDML 47 LDML 47 Release (Stable) labels Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LDML 47 LDML 47 Release (Stable) question Further information is requested Seek-Feedback-in-Preview Issue should be something we seek feedback on in the tech preview period syntax Issues related with MF Syntax
Projects
None yet
Development

No branches or pull requests

5 participants