From 2ce9726c6021dfbdef0fa00606e120cda028571a Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Wed, 25 Oct 2023 21:27:34 -0700 Subject: [PATCH 01/19] typo --- spec/syntax.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/syntax.md b/spec/syntax.md index 7091486f6..aeb5f694d 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -64,7 +64,7 @@ The syntax specification takes into account the following design restrictions: ## Messages and their Syntax -The purpose of MessageFormat is the allow content to vary at runtime. +The purpose of MessageFormat is to allow content to vary at runtime. This variation might be due to placing a value into the content or it might be due to selecting a different bit of content based on some data value or it might be due to a combination of the two. From 4c0cfc6c5063c627b82aff3fd31cd2ae3b76640c Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Wed, 25 Oct 2023 22:16:15 -0700 Subject: [PATCH 02/19] Contextualize templating libraries --- exploration/text-vs-code.md | 44 +++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 12 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 816d12d9a..608d06312 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -28,18 +28,38 @@ ICU MessageFormat and Fluent both support inline selectors separated from the text using `{…}` for multi-variant messages. ICU MessageFormat is the only known format that uses `{…}` to also delimit text. -[Mustache templates](https://mustache.github.io/mustache.5.html) -and related languages wrap "code" in `{{…}}`. -In addition to placeholders that are replaced by their interpolated value during formatting, -this also includes conditional blocks using `{{#…}}`/`{{/…}}` wrappers. - -[Handlebars](https://handlebarsjs.com/guide/) extends Mustache expressions -with operators such as `{{#if …}}` and `{{#each …}}`, -as well as custom formatting functions that become available as e.g. `{{bold …}}`. - -[Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/) separate -`{% statements %}` and `{{ expressions }}` from the base text. -The former may define tests that determine the inclusion of subsequent text blocks in the output. +Formatting and templating are distinct operations with similarities. +Both interpolate strings by using input values, +provided as inputs alongisde the formatting pattern string or template, +to produce a new string. +Formatting usually refers to smaller strings, usually no larger than a sentence, +whereas templating are used to produce larger strings, usually for text files of various file formats, often for HTML documents. + +There are two different styles of templating library design. +Some languages/libraries enable the interopolation of the template substrings through programmatic expressions in "code mode" that print expressions to the output stream +(ex: [PHP](https://www.php.net/), +[Freemarker](https://freemarker.apache.org/index.html)): +```php + +... + Hello World

'; + } + ?> +... + +``` +Some libraries separate string literal values from the programmatic expressions in "code mode" by defining a set of control flow constructs within delimiters, +and all text outside the delimiters is printed to the output stream, +and subject to control flow rules of their containing constructs. +(ex: [Mustache templates](https://mustache.github.io/mustache.5.html), +[Freemarker](https://freemarker.apache.org/index.html)). +``` +{{#repo}} + {{name}} +{{/repo}} +``` A cost that the message formatting and templating languages mentioned above need to rely on is some rule or behaviour that governs how to deal with whitespace at the beginning and end of a pattern, From 40ad868f80e3400b5448d379da40b6b38f087a6e Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Wed, 25 Oct 2023 23:19:10 -0700 Subject: [PATCH 03/19] Differentiate template rules from output format rules, give examples of whitespace handling rules for template languages --- exploration/text-vs-code.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 608d06312..342ca3fd0 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -60,10 +60,18 @@ and subject to control flow rules of their containing constructs. {{name}} {{/repo}} ``` - -A cost that the message formatting and templating languages mentioned above need to rely on -is some rule or behaviour that governs how to deal with whitespace at the beginning and end of a pattern, -as statements may be separated from each other by newlines or other constructs for legibility. +Some templating libraries support both styles. + +When considering string formatting and templating libraries, +it is important to keep the rules of pattern or template handling separate from and uninfluenced by the output format's rules. +For example, many templating languages are designed around producing HTML output, for which consecutive whitespace characters within the output are collapsed into a single ASCII space. +However, if the templating language is not strict on preserving whitespace, +then it would be incapable of generating Python source code, +for which whitespace is significant in determining block scope via the indentation (leading whitespace on a line). + +In fact, some HTML-oriented templating libraries preserve whitespace by default in a what-you-see-is-what-you-get (WYSIWYG) manner (Mustache, [Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/#whitespace-control)), +and some perform whitespace trimming in unspecified ways ([Handlebars](https://handlebarsjs.com/guide/expressions.html#whitespace-control)). +The [whitespace behavior for Freemarker](https://freemarker.apache.org/docs/dgui_misc_whitespace.html), a general purpose templating library for multiple formats, is also WYSIWYG by default while allowing several optional trimming controls. Other formats supporting multiple message variants tend to rely on a surrounding resource format to define variants, such as [Rails internationalization](https://guides.rubyonrails.org/i18n.html#pluralization) in Ruby or YAML From a8c76cd78ab5c0a053fa7945ef4f21b839c48666 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 13:36:18 -0700 Subject: [PATCH 04/19] Update containing format escape rules interaction and assumptions about frequency --- exploration/text-vs-code.md | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 342ca3fd0..09d103f64 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -76,13 +76,26 @@ The [whitespace behavior for Freemarker](https://freemarker.apache.org/docs/dgui Other formats supporting multiple message variants tend to rely on a surrounding resource format to define variants, such as [Rails internationalization](https://guides.rubyonrails.org/i18n.html#pluralization) in Ruby or YAML and [Android String Resources](https://developer.android.com/guide/topics/resources/string-resource.html#Plurals) in XML. -These formats rely on the resource format providing clear delineation of the beginning and end of a pattern. +A message author will need to resolve the combination of the rules of these formats and the rules of the containing resource formats in order to achieve a clear delineation of the beginning and end of a pattern. +For example, an Android resource string that includes leading whitespace in the message might look like +``` +" Section 7.a. Attribute Types" +``` +In this example above, the containing XML format will collapse consecutive whitespace characters into a single space unless you provide the attribute `xml:space="preserve"`. +After the resource file gets parsed as XML, the Android string resource format +[does additional whitespace collapsing and Android escaping](https://developer.android.com/guide/topics/resources/string-resource#escaping_quotes), +requiring the entire text node string to be wrapped in double quotation marks `"..."` to preserve the initial whitespace, or the inital whitespace to use Android escaping (`\u0032 \u0032 ...`). Based on available data, no more than 0.3% of all messages and no more than 0.1% of messages with variants contain leading or trailing whitespace. -No more than one third of this whitespace is localizable, -and most commonly it's due to improper segmentation or other internationalization bugs. +However, frequency of occurrence is not an indicator of the importance of leading or trailing whitespace to those authoring such messages. +For example, sometimes such messages are authored in order to achieve a semblance of formatting in contexts that lack rich text presentation styles, +such as operating system widgets. +Even though such messages are usually infrequent relative to the size of all user-facing / transalatable messages, +that is not an indicator of their significance. +Also importantly, we cannot make assumptions about the validity of leading or trailing whitespace in a message, +especially since their usage may be entirely unrelated to internationalization issues (ex: sentence agreement disruption by concatenation). ## Use-Cases From 506ce652b69919402a125a5b69af926966bf0088 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 14:15:03 -0700 Subject: [PATCH 05/19] Move usage stats from Background to Use Cases, contextualize frequency vs. importance vs. i18n best practices --- exploration/text-vs-code.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 09d103f64..3976f3175 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -100,7 +100,6 @@ especially since their usage may be entirely unrelated to internationalization i ## Use-Cases Most messages in any localization system do not contain any expressions, statements or variants. -These should be expressible as easily as possible. Many messages include expressions that are meant to be replaced during formatting. For example, a greeting like "Hello, {$username}!" would be formatted with the variable @@ -126,9 +125,20 @@ according to its plural category So, in American English, the formatter might need to choose between formatting `You have 1 kilometer to go` and `You have 2 kilometers to go`. -Rarely, messages needs to include leading or trailing whitespace due to -e.g. how they will be concatenated with other text, +Rarely do messages that need to include leading or trailing whitespace do so due to +how they will be concatenated with other text, or as a result of being segmented from some larger volume of text. +Based on available data, +no more than 0.3% of all messages and no more than 0.1% of messages with variants +contain leading or trailing whitespace. + +However, frequency of occurrence is not an indicator of the importance of leading or trailing whitespace to those authoring such messages. +For example, sometimes such messages are authored in order to achieve a semblance of formatting in contexts that lack rich text presentation styles, +such as operating system widgets. +Even though such messages are usually infrequent relative to the size of all user-facing / transalatable messages, +that is not an indicator of their significance. +Also importantly, we cannot make assumptions about the validity of leading or trailing whitespace in a message, +especially since their usage may be entirely unrelated to internationalization issues (ex: sentence agreement disruption by concatenation). --- From cb16e71489930e60a6fbf5bded7633b7122f4947 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 14:22:51 -0700 Subject: [PATCH 06/19] De-emphasize overstated developer-only concern --- exploration/text-vs-code.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 3976f3175..ccbe1c1ea 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -142,9 +142,6 @@ especially since their usage may be entirely unrelated to internationalization i --- -Developers editing a simple message and who wish to add an `input` or `local` annotiation -to the message do not wish to reformat the message extensively. - Developers who have messages that include leading or trailing whitespace want to ensure that this whitespace is included in the translatable text portion of the message. From 1417fd4b3be5975ce02e496c3c49ccc379236642 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 15:29:18 -0700 Subject: [PATCH 07/19] Fix design tenet wording using quotes from noteworthy prior art --- exploration/text-vs-code.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index ccbe1c1ea..457983042 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -149,7 +149,24 @@ Which whitespace characters are displayed at runtime should not be surprising. ## Requirements -Common things should be easy, uncommon things should be possible. +It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things. + +
+
+APIs should be easy to use and hard to misuse. It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things. + +—Joshua Bloch, 2008, author of Effective Java, etc. +
+
+The Pit of Success: in stark contrast to a summit, a +peak, or a journey across a desert to find victory through many trials and +surprises, we want our customers to simply fall into winning practices by using +our platform and frameworks. To +the extent that we make it easy to get into trouble we fail. + +—Rico Mariani, MS Research MindSwap Oct 2003. (restated by Brad Adams, MS CLR and .Net team cofounder) +
+
Developers and translators should be able to read and write the syntax easily in a text editor. From b7870d455dd47bf7b2de84913ec09df2cc130048 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 15:30:02 -0700 Subject: [PATCH 08/19] Update rest of Requirements --- exploration/text-vs-code.md | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 457983042..28f67b822 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -171,27 +171,40 @@ the extent that we make it easy to get into trouble we fail. Developers and translators should be able to read and write the syntax easily in a text editor. Translators (and their tools) are not software engineers, so we want our syntax -to be as simple, robust, and non-fussy as possible. -Multiple levels of complex nesting should be avoided, -along with any constructs that require an excessive -level of precision on the part of non-technical authors. +to be as simple and robust as possible. + +Nesting level is not a requirement. +People are not parsers, and don't care about nesting. +What matters to them is their ability to recognize where a message pattern starts and where it ends. +In the following example, localizable text is easily recognizable (especially with syntax highlighting), +even if it occurs 3 level deep. + +```java +print "{{{This is translatable}}}" +if (foo) print "{{{This is translatable}}}" else print "{{{This is NOT translatable}}}" +if (foo) if (bar) switch (baz) case 1: print "{{{This is translatable, deep}}}" break; default: print "{{{This is NOT translatable, deep}}}" +``` As MessageFormat 2 will be at best a secondary language to all its authors and editors, it should conform to user expectations and require as little learning as possible. -The syntax should avoid footguns, -in particular as it's passed through various tools during formatting. +The syntax should avoid footguns, in particular as it's passed through various tools during formatting or stored existing file formats, databases, etc. +Very importantly in this regard, +we should minimize the range of characters that need to be escaped in patterns. + ASCII-compatible syntax. While support for non-ASCII characters for variable names, values, literals, options, and the like are important, the syntax itself should be restricted to ASCII characters. This allows the message to be parsed visually by humans even when embedded in a syntax that requires escaping. -Whitespace is forgiving. -We _require_ the minimum amount of whitespace and allow -authors to format or change unimportant whitespace as much as they want. +Whitespace is forgiving, so we should be flexible with its use in the code area of message. This avoids the need for translators or tools to be super pedantic about formatting. +However, we want WYSIWYG behavior as much as possible in patterns, meaning that there is minimal visual difference +between the pattern and its interpolated output, +and that there is minimal ambiguity. +This avoids chances for unwanted surprises between the message authoring time expectations and the actual runtime formatted results. ## Constraints From 2d8cf0a4545d92bd8c393e9f29efa1e107503837 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 15:30:18 -0700 Subject: [PATCH 09/19] Update Constraints --- exploration/text-vs-code.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 28f67b822..52269c727 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -214,13 +214,25 @@ The current syntax includes some plain-ascii keywords: `input`, `local`, `match`, and `when`. The current syntax and active proposals include some sigil + name combinations, -such as `:number`, `$var`, `|literal|`, `+bold`, and `@attr`. +such as :number, $var, |literal|, +bold, -bold, and posibly @attr. The current syntax supports unquoted literal values as operands. -Messages themselves are "simple strings" and must be considered to be a single -line of text. In many containing formats, newlines will be represented as the local -equivalent of `\n`. +Messages themselves are "simple strings" and must be considered to be WYSIWYG. +The WYSIWYG nature of representing a message pattern is independent of whether the message is a single line or contains multiple lines. + +There is no restriction that a message must only contain a single line (that is, not contain any newline characters), +nor are there constraints about how newlines must be represented. As our [`spec/syntax.md`](../spec/syntax.md) states: + +> Any Unicode code point is allowed, except for surrogate code points U+D800 through U+DFFF inclusive. + +> Whitespace in _text_, including tabs, spaces, and newlines is significant and MUST be preserved during formatting. + +> ... Instead, we tolerate direct use of nearly all +characters (including line breaks, control characters, etc.) and rely upon escaping +in those outer formats to aid human comprehension (e.g., depending upon container +format, a U+000A LINE FEED might be represented as `\n`, `\012`, `\x0A`, `\u000A`, +`\U0000000A`, ` `, ` `, `%0A`, ``, or something else entirely). ## Proposed Design From e86fa480bed642f36cf061761743c7975f0925f8 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 16:21:00 -0700 Subject: [PATCH 10/19] Remove duplicate section not properly deleted after copied for a move --- exploration/text-vs-code.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 52269c727..966c6b142 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -86,17 +86,6 @@ After the resource file gets parsed as XML, the Android string resource format [does additional whitespace collapsing and Android escaping](https://developer.android.com/guide/topics/resources/string-resource#escaping_quotes), requiring the entire text node string to be wrapped in double quotation marks `"..."` to preserve the initial whitespace, or the inital whitespace to use Android escaping (`\u0032 \u0032 ...`). -Based on available data, -no more than 0.3% of all messages and no more than 0.1% of messages with variants -contain leading or trailing whitespace. -However, frequency of occurrence is not an indicator of the importance of leading or trailing whitespace to those authoring such messages. -For example, sometimes such messages are authored in order to achieve a semblance of formatting in contexts that lack rich text presentation styles, -such as operating system widgets. -Even though such messages are usually infrequent relative to the size of all user-facing / transalatable messages, -that is not an indicator of their significance. -Also importantly, we cannot make assumptions about the validity of leading or trailing whitespace in a message, -especially since their usage may be entirely unrelated to internationalization issues (ex: sentence agreement disruption by concatenation). - ## Use-Cases Most messages in any localization system do not contain any expressions, statements or variants. From 0579b4a2e76bf39e1e54ea168edf8d76ba003d7d Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 18:47:13 -0700 Subject: [PATCH 11/19] Update Design area (proposed, alternatives, simple message consensus) --- exploration/text-vs-code.md | 84 ++++++++++++++++++++++++++++++------- 1 file changed, 69 insertions(+), 15 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 966c6b142..9d89d1319 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -223,8 +223,73 @@ in those outer formats to aid human comprehension (e.g., depending upon containe format, a U+000A LINE FEED might be represented as `\n`, `\012`, `\x0A`, `\u000A`, `\U0000000A`, ` `, ` `, `%0A`, ``, or something else entirely). +## Simple Messages + +In the Subcommittee meetings following Github discussions on Issues #493 and #499, +the general consensus that formed for simple messages +is that we allow them to be unquoted. + +("Simple messages" refers to messages consisting solely of a pattern, and thus are not complex messages.) + +Because the simple message pattern consists of the entire message, +the pattern includes any leading or trailing whitespace. + +Given simple messages already being decided at a high level, +the design decisions below for the proposed and alternative designs pertain specifically to complex messages. + ## Proposed Design +### Start in text, encapsulate message, always quote patterns + +Description: + +Since simple messages are unquoted (starting in text mode), +complex messages must also start in text mode. + +Within a complex message, patterns are always quoted with `{{...}}` or other choice of delimiter. + +The entire complex message is also wrapped with `{{...}}` or other choice of delimiter. +This allows interior "code mode" of message to have flexible whitespace in between tokens +and _around_ quoted patterns. + +Pros + +* The rule about the whether leading and trailing whitespace is included is simple and unambiguous. +* This matches the WYSIWIG behavior that simple messages preserve. +* The patterns can be detected within the pattern more easily due to the delimiters serving as a visual anchor. +* Requiring all patterns to be quoted minimizes the number of characters that need to be escaped within a pattern to 3: +the 2 pattern delimiter characters and the escape character itself. +* Because the sum of counts of declarations + `match` statement + `when` statements is always +greater than or equal to the number of patterns, +wrapping the entire message once yields less visual noise of repetitive code mode introducer symbols +when there is 1+ declarations in a `match` (selection) message, +or when there are 2+ declarations in a non-`match` complex message. + +Cons: + +* This comes at the cost of an inconsistency in the WYSIWYG patterns are quoted between simple and complex messages. +In the case of simple messages, the containing format itself implicitly defines the beginning and end of the pattern (example: `"..."`), which is not visible at the level of MF2, +while complex messages use the aforementioned delimiter to quote patterns (ex: `{{...}}`). +* Another potential drawback, specifically in the case of non-`match` complex messages with exactly 1 declaration, +is that this option adds 2 extra delimiters compared to an alternative syntax that doesn't require quoted patterns +and is designed to minimize delimiter usage only to code mode introducers. + +Evaluation: + +The pros outweigh the cons, not just in cardinality, but far more importantly, according to the relative weight +our value system places to the requirements met by the pro aspects compared to the con aspects. Namely: + +* [high] Unsurprising WYSIWYG behavior from patterns +* [high] Easy recognition of patterns, even for non-developers +* [high] A minimal number of characters requiring escaping +* [high] No limitations on users with valid non-i18n concerns +* [med] Flexible whitespace outside of patterns +* [low] Number of characters typed (probably comparable with alternatives anyways) +* [low] Number of "mode levels" from a parser perspective + + +## Alternatives Considered + ### Start in text, encapsulate code, trim around statements Allow for message patterns to not be quoted. @@ -244,8 +309,6 @@ Allow for a pattern to be `{{…}}` quoted such that it preserves its leading and/or trailing whitespace even when preceded or followed by statements. -## Alternatives Considered - ### Start in code, encapsulate text This approach treats messages as something like a resource format for pattern values. @@ -262,19 +325,6 @@ so messages may appear wrapped as e.g. `"{{…}}"`. This option is not chosen due to adding an excessive quoting burden on all messages. -### Start in text, encapsulate code, re-encapsulate text within code - -As in the proposed design, simple patterns are unquoted. -Patterns in messages with statements, however, -are required to always be surrounded by `{{…}}` or some other delimiters. - -This effectively means that some syntax will "enable" code mode for a message, -and that patterns in such a message need delimiters. - -This option is not chosen due to adding an excessive -quoting burden on all multi-variant messages, -as well as introducing an unnecessary additional conceptual layer to the syntax. - ### Start in text, encapsulate code, trim minimally This is the same as the proposed design, @@ -315,3 +365,7 @@ With these changes, all whitespace would need to be explicitly within the "code" part of the syntax, and patterns could never be separated from statements without adding whitespace to the pattern. + +## Scoring matrix + +TBD From 94507580cb9c3a5c5ed38ae8b946511ea8df9b73 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 18:47:55 -0700 Subject: [PATCH 12/19] Update title and objective to reflect focus of discussion --- exploration/text-vs-code.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 9d89d1319..2eef8cecf 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -1,4 +1,4 @@ -# Message Parse Mode +# Message Pattern Quoting Status: **Proposed** @@ -17,7 +17,9 @@ Status: **Proposed** ## Objective -Decide whether text patterns or code statements should be enclosed in MF2. +Decide whether text patterns must always be quoted, +or whether we allow them to be optionally quoted, +for non-simple messages in MF2. ## Background From 17f1d4675610bd3ce93c36d194fb1a1a0271a4d7 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 18:48:11 -0700 Subject: [PATCH 13/19] Wordsmithing and formatting --- exploration/text-vs-code.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 2eef8cecf..4771663f4 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -66,12 +66,14 @@ Some templating libraries support both styles. When considering string formatting and templating libraries, it is important to keep the rules of pattern or template handling separate from and uninfluenced by the output format's rules. -For example, many templating languages are designed around producing HTML output, for which consecutive whitespace characters within the output are collapsed into a single ASCII space. -However, if the templating language is not strict on preserving whitespace, +For example, many templating languages are designed around producing HTML output, for which consecutive whitespace characters within the output are collapsed into a single ASCII space by HTML renderers. +However, if the templating language is similarly not strict on preserving whitespace, then it would be incapable of generating Python source code, for which whitespace is significant in determining block scope via the indentation (leading whitespace on a line). -In fact, some HTML-oriented templating libraries preserve whitespace by default in a what-you-see-is-what-you-get (WYSIWYG) manner (Mustache, [Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/#whitespace-control)), +In fact, many HTML-oriented templating libraries preserve whitespace by default in a what-you-see-is-what-you-get (WYSIWYG) manner +([Mustache](https://mustache.github.io/mustache.5.html#Sections), +[Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/#whitespace-control)), and some perform whitespace trimming in unspecified ways ([Handlebars](https://handlebarsjs.com/guide/expressions.html#whitespace-control)). The [whitespace behavior for Freemarker](https://freemarker.apache.org/docs/dgui_misc_whitespace.html), a general purpose templating library for multiple formats, is also WYSIWYG by default while allowing several optional trimming controls. @@ -165,8 +167,8 @@ Translators (and their tools) are not software engineers, so we want our syntax to be as simple and robust as possible. Nesting level is not a requirement. -People are not parsers, and don't care about nesting. -What matters to them is their ability to recognize where a message pattern starts and where it ends. +People are not parsers, and don't care about the nesting level as a primary concern when reading a message. +What matters to them is their ability to recognize where a message's pattern starts and where it ends. In the following example, localizable text is easily recognizable (especially with syntax highlighting), even if it occurs 3 level deep. @@ -205,7 +207,7 @@ The current syntax includes some plain-ascii keywords: `input`, `local`, `match`, and `when`. The current syntax and active proposals include some sigil + name combinations, -such as :number, $var, |literal|, +bold, -bold, and posibly @attr. +such as `:number`, `$var`, `|literal|`, `+bold`, `-bold`, and posibly `@attr`. The current syntax supports unquoted literal values as operands. From 0e169f8491ff6a712920034ec571c8ec0d7331db Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Thu, 26 Oct 2023 18:50:27 -0700 Subject: [PATCH 14/19] Update contributors list --- exploration/text-vs-code.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 4771663f4..3a45694c4 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -8,6 +8,8 @@ Status: **Proposed**
Contributors
@eemeli
@aphillips
+
@mihnita
+
@echeran
First proposed
2023-09-13
Pull Request
From 2c7f9775606eee6f1af0dd509b3d6d18515932f3 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Fri, 27 Oct 2023 16:01:48 -0700 Subject: [PATCH 15/19] Apply suggestions from code review Co-authored-by: Addison Phillips --- exploration/text-vs-code.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 3a45694c4..4678e1f8e 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -19,9 +19,9 @@ Status: **Proposed** ## Objective -Decide whether text patterns must always be quoted, -or whether we allow them to be optionally quoted, -for non-simple messages in MF2. +Decide whether to permit _patterns_ (message text) to be unquoted +when embedded in code. +Currently all _patterns_ must be quoted in non-simple messages. ## Background @@ -33,11 +33,9 @@ separated from the text using `{…}` for multi-variant messages. ICU MessageFormat is the only known format that uses `{…}` to also delimit text. Formatting and templating are distinct operations with similarities. -Both interpolate strings by using input values, -provided as inputs alongisde the formatting pattern string or template, -to produce a new string. +Both take input values to replace portions of the pattern string or template, producing a new, formatted, string. Formatting usually refers to smaller strings, usually no larger than a sentence, -whereas templating are used to produce larger strings, usually for text files of various file formats, often for HTML documents. +whereas templating is typically used to produce larger strings (generally whole documents, such as an HTML file) There are two different styles of templating library design. Some languages/libraries enable the interopolation of the template substrings through programmatic expressions in "code mode" that print expressions to the output stream @@ -209,7 +207,7 @@ The current syntax includes some plain-ascii keywords: `input`, `local`, `match`, and `when`. The current syntax and active proposals include some sigil + name combinations, -such as `:number`, `$var`, `|literal|`, `+bold`, `-bold`, and posibly `@attr`. +such as `:number`, `$var`, `|literal|`, `+bold`, `-bold`, and possibly `@attr`. The current syntax supports unquoted literal values as operands. From 8e0b85266acc29fa564305c13fc2f0fb7df7e845 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Fri, 27 Oct 2023 23:06:43 +0000 Subject: [PATCH 16/19] Apply suggestions from code review --- exploration/text-vs-code.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 4678e1f8e..cd5db63bc 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -86,7 +86,7 @@ For example, an Android resource string that includes leading whitespace in the " Section 7.a. Attribute Types" ``` In this example above, the containing XML format will collapse consecutive whitespace characters into a single space unless you provide the attribute `xml:space="preserve"`. -After the resource file gets parsed as XML, the Android string resource format +After the resource file gets parsed as XML, the Android resource compiler requires [does additional whitespace collapsing and Android escaping](https://developer.android.com/guide/topics/resources/string-resource#escaping_quotes), requiring the entire text node string to be wrapped in double quotation marks `"..."` to preserve the initial whitespace, or the inital whitespace to use Android escaping (`\u0032 \u0032 ...`). @@ -118,20 +118,20 @@ according to its plural category So, in American English, the formatter might need to choose between formatting `You have 1 kilometer to go` and `You have 2 kilometers to go`. -Rarely do messages that need to include leading or trailing whitespace do so due to -how they will be concatenated with other text, +Rarely, messages need to include leading or trailing whitespace due to +e.g. how they will be concatenated with other text, or as a result of being segmented from some larger volume of text. Based on available data, no more than 0.3% of all messages and no more than 0.1% of messages with variants contain leading or trailing whitespace. However, frequency of occurrence is not an indicator of the importance of leading or trailing whitespace to those authoring such messages. -For example, sometimes such messages are authored in order to achieve a semblance of formatting in contexts that lack rich text presentation styles, +For example, sometimes such messages are authored in order to achieve a [semblance of formatting in contexts that lack rich text presentation styles](https://docs.oracle.com/cd/E19957-01/817-4220/images/SetupWizWelcome2.gif), such as operating system widgets. Even though such messages are usually infrequent relative to the size of all user-facing / transalatable messages, that is not an indicator of their significance. -Also importantly, we cannot make assumptions about the validity of leading or trailing whitespace in a message, -especially since their usage may be entirely unrelated to internationalization issues (ex: sentence agreement disruption by concatenation). +There are valid use cases for leading or trailing whitespace in a message that are not internationalization bugs. +This means that it is not MF2's concern to enforce/discourage their usage. --- @@ -146,7 +146,7 @@ It should be easy to do simple things; possible to do complex things; and imposs
-APIs should be easy to use and hard to misuse. It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things. +APIs should be easy to use and hard to misuse. It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things. (per Joshua Bloch) —Joshua Bloch, 2008, author of Effective Java, etc.
@@ -191,7 +191,7 @@ values, literals, options, and the like are important, the syntax itself should be restricted to ASCII characters. This allows the message to be parsed visually by humans even when embedded in a syntax that requires escaping. -Whitespace is forgiving, so we should be flexible with its use in the code area of message. +We should be flexible with the use of whitespace in the code area of message. This avoids the need for translators or tools to be super pedantic about formatting. However, we want WYSIWYG behavior as much as possible in patterns, meaning that there is minimal visual difference @@ -277,6 +277,7 @@ while complex messages use the aforementioned delimiter to quote patterns (ex: ` * Another potential drawback, specifically in the case of non-`match` complex messages with exactly 1 declaration, is that this option adds 2 extra delimiters compared to an alternative syntax that doesn't require quoted patterns and is designed to minimize delimiter usage only to code mode introducers. +* If we use curlies for patterns and for placeholders, then they serve double duty, which may make the syntax harder to understand, and also harder to make the pattern out visually. Evaluation: From cdcfdd52492c01de5e49bc04b713be49cda762e5 Mon Sep 17 00:00:00 2001 From: Elango Cheran Date: Sun, 29 Oct 2023 21:02:24 -0700 Subject: [PATCH 17/19] Rewrite largely based on PR #504; keep priorities, non-reqs, prev info --- exploration/text-vs-code.md | 478 +++++++++++++++++++++++------------- 1 file changed, 314 insertions(+), 164 deletions(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index cd5db63bc..ca0bd047c 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -1,4 +1,4 @@ -# Message Pattern Quoting +# Unquoted Variant Patterns Status: **Proposed** @@ -19,12 +19,281 @@ Status: **Proposed** ## Objective -Decide whether to permit _patterns_ (message text) to be unquoted -when embedded in code. -Currently all _patterns_ must be quoted in non-simple messages. +The current syntax requires all patterns to be "quoted" in non-simple messages. + +We need to determine if we should allow patterns to be unquoted and, if so, +how to determine the boundary between the pattern and any message code. ## Background +### Summary + +A _pattern_ is the portion of a _message_ that will be formatted to produce +output in a call to the MessageFormat API. + +Patterns include _text_ and _placeholders_. + +_Placeholders_ are the parts of the pattern that are replaced during formatting. +>For example, in the following message, `{$var}` is a placeholder: +>``` +>This has a {$var} placeholder. +>``` + +_Text_ are the string literal parts of a message. +Text consists of a sequence of Unicode characters. +This include all Unicode whitespace characters. + +>[!IMPORTANT] +>When whitespace appears in (is part of) a pattern it is _always_ preserved by MessageFormat + +There are three ways that patterns can appear in a message: +> 1. As a simple message: +> ``` +> Hello {$user}! +> ``` +> 2. As a pattern following declarations: +> ``` +> {{ +> input {$num :number} +> {{ This is the {$num} pattern }} +> }} +> ``` +> 3. As part of a variant (following a _key_): +> ``` +> {{ +> match {$num :plural} +> when 0 {{ This is the zero pattern }} <- the {{}} part is a pattern +> when one {{ This is the {$num} pattern }} +> when * {{ These are the {$num} patterns }} +> }} +> ``` + +With the current syntax, the boundary between pattern and the remainder of the message is always clear +because the pattern is either the entire string +or enclosed in the `{{...}}` quoting required by the syntax. + + +## Use-Cases + +As a user, I sometimes need to include pattern-meaningful whitespace at +the start or end of a pattern and I expect that whitespace to be preserved +through the MessageFormat process. +I understand that my choice of storage format (such as .properties, strings.xml, +ListResourceBundle, gettext `.po`, etc. etc.) may impact whitespace appearing +in my serialization format. +However, I need to know reliably whether whitespace in a _message_ will appear +(or not appear) in my output. + +As a translator, I need to determine the boundaries of the patterns easily. +I also need to be able to modify the whitespace that appears in a given +pattern, including at the front or end of the pattern. + +As a developer, I want to format my MF2 patterns in ways that are easy for +other developers, UX designers, or translators to use. +I want to be able to break longer messages +(particularly those with match statements) +into multiple lines for readability without negatively affecting the output. + +## Requirements + +* [high] Whitespace that is definitely inside a pattern must be preserved by MF2 formatters. +* [high] The boundary of a pattern should be as simple to define and easy to visually detect as possible. +* [high] Message syntax should avoid as many escape sequences as possible, particularly those that + might interfere with or require double-escaping in storage formats. +* [med] Whitespace that is permitted in a message that is not part of the pattern should be forgiving + and not require special effort to manage. +* [med] The message syntax should be as simple and robust as possible, but no simpler. +* [med] The message syntax should avoid as many levels of nesting or character pairing + as possible, but no more. + + +## Non-requirements + +* We should not make things harder for users unless it is to discourage well-known i18n bad practices (ex: message concatenation). + - There are valid non-i18n use cases in OSes for leading/trailing whitespace. +* We should not confuse the _frequency_ of a usage pattern with its _impact_. + - The _impact_ of a user making a surprising mistake depends on the _cost to fix it_, or the value of where it is used, or other factors. The occurrence rate in a message corpus usually does not directly reflect those concerns. + - Also, instead of frequency of mistakes, we should consider how well we make it difficult to make mistakes. + > It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things. + > + > —Joshua Bloch, 2008, author of Effective Java, etc. +* Reducing the number of characters typed to the point it reduces clarity + - Beyond a certain point, there becomes a tradeoff between clarity and concision. (ex: Perl) + +## Constraints + +Some of the alternatives will require changes to the syntax to produce better usability. +Similarly the current syntax could benefit from improvements if we decide to keep it. + +There are a limited number of sigils available for quoting. + +## Simple Messages + +In the Subcommittee meetings following Github discussions on Issues #493 and #499, +the general consensus that formed for simple messages +is that we allow them to be unquoted. + +("Simple messages" refers to messages consisting solely of a pattern, and thus are not complex messages.) + +Because the simple message pattern consists of the entire message, +the pattern includes any leading or trailing whitespace. + +Given simple messages already being decided at a high level, +the design decisions below for the proposed and alternative designs pertain specifically to complex messages. + +## Proposed Design + +Currently the syntax uses the first alternative below. + +## Alternatives Considered + +There are five candidates for handling the boundaries between code and patterns: + +1. Always quote non-simple patterns (current design) +2. Never quote patterns (all whitespace is significant) +3. Permit non-simple patterns to be quoted and trim unquoted whitespace +4. Trim all unquoted whitespace, but do not permit quoting non-simple patterns +5. Selectively trim patterns (all whitespace is otherwise significant) + +### Always Quote + +Pros: +- The boundary between pattern and code is always clear. +- The quoting reduces the number of in-pattern escapes to the open/close sequence. + and the placeholder sequence sigils. +- Since the pattern is already quoted, translators never have to add pattern quotes + in order to add PEWs to a given pattern. + This also might avoid some tools forcing escaping on added quotes that are needed. + +Cons: +- Requires matching open/close quotes. + +### Never Quote Patterns + +In this alternative, all non-code whitespace is significant. +We have to use a slightly different syntax in the example, so that +the boundary between code and pattern works. +>``` +>{{ +> match {$var} +> {when *} This pattern has a space in front (it's between \} and This) +> {when other} +> This pattern has a newline and six spaces in front of it +> {when moo}This pattern has no spaces in front of it, but an invisible space at the end +>}} +>``` + +Pros: +- WYSIWYG (on steroids) + +Cons: +- Probably not a serious alternative: the example + includes any number of obvious footguns that have to be addressed + +### Permit pattern quoting + +In this alternative, non-simple patterns are trimmed, but it is +possible to use quoting to separate the pattern from code (and prevent trimming) +>``` +>{match {$var}} +>{when 0} This has no space in front of it. +>{when one} +> This has no space or newline in front of it. +>{when few} +> {{ This has one space at the start and the end. }} +>{when many} {{ This also has one space start and end. }} +>{when *}{{You can quote patterns even without whitespace.}} +>``` + +Pros: +- Code is special instead of text. +- Easy to use (best of both worlds?) + +Cons: +- Requires one of the alternate syntaxes +- Has two ways to represent a pattern. +- May be difficult for translators to add quotes when needed. + +### Trim All Unquoted + +In this alternative, all non-code whitespace is trimmed +and we do not allow/provide for pattern quoting. +Instead, PEWS whitespace must be individually quoted. + +> [!NOTE] +> Whitespace quoting also works in the preceeding alternatives +> because it is an inherent part of the syntax. +> We don't show it in those alternatives because it is +> distracting. + +>``` +>{match {$var}} +>{when 0} This has no space in front of it. +>{when one} +> This has no space or newline in front of it. +>{when few} +> {||} This has one space at the start and the end. {||} +>{when many} {| |}This also has one space start and end.{| |} +>{when *} +> +> No amount of whitespace matters before this pattern +> but all of the whitespace at the end does. +> +> {||} +>``` + +Pros: +- Code is special, whitespace is not. +- Makes PEWS into a "special event", alerting developers to the non-I18N aspects of it? + +Cons: +- Weird and unattractive. + +### Selective Trimming + +In this alternative, only specific whitespace is automatically trimmed +and the whitespace can be omitted. +This is similar to "Never Quote Patterns" in that all whitespace +is significant **_except_** for a newline, space, or newline space +directly after code: +>``` +>{match {$var}} +>{when 0} This has no space in front of it. +>{when one} +> This has no space or newline in front of it. +>{when 1} +> This has no newline but does have one space in front of it. +>{when few} +> This has no space or newline in front of it or at the end {when many}This has no spaces or newlines. +>{when 11} +> +> This has a newline and a space at the start and a space-newline at the end +> +>>{when *}{| +>|} You can quote the newlines and spaces should you desire {| +>|} +>``` + +Pros: +- More foregiving in some circumstances? + +Cons: +- More complicated to use. +- Users may be unclear where the boundaries are. + + +## Scoring matrix + +TBD + +## Extra Info + +### Extra Info: Background + +
+ +#### Formatting and templating + Existing message and template formatting languages tend to start in "text" mode, and require special syntax like `{{` or `{%` to enter "code" mode. @@ -37,6 +306,8 @@ Both take input values to replace portions of the pattern string or template, pr Formatting usually refers to smaller strings, usually no larger than a sentence, whereas templating is typically used to produce larger strings (generally whole documents, such as an HTML file) +#### Templating styles and nesting levels + There are two different styles of templating library design. Some languages/libraries enable the interopolation of the template substrings through programmatic expressions in "code mode" that print expressions to the output stream (ex: [PHP](https://www.php.net/), @@ -44,11 +315,11 @@ Some languages/libraries enable the interopolation of the template substrings th ```php ... - Hello World

'; - } - ?> +Hello World

'; + } +?> ... ``` @@ -64,6 +335,8 @@ and subject to control flow rules of their containing constructs. ``` Some templating libraries support both styles. +#### Templating whitespace handling + When considering string formatting and templating libraries, it is important to keep the rules of pattern or template handling separate from and uninfluenced by the output format's rules. For example, many templating languages are designed around producing HTML output, for which consecutive whitespace characters within the output are collapsed into a single ASCII space by HTML renderers. @@ -77,6 +350,8 @@ In fact, many HTML-oriented templating libraries preserve whitespace by default and some perform whitespace trimming in unspecified ways ([Handlebars](https://handlebarsjs.com/guide/expressions.html#whitespace-control)). The [whitespace behavior for Freemarker](https://freemarker.apache.org/docs/dgui_misc_whitespace.html), a general purpose templating library for multiple formats, is also WYSIWYG by default while allowing several optional trimming controls. +#### Containing Format Interaction with Messages + Other formats supporting multiple message variants tend to rely on a surrounding resource format to define variants, such as [Rails internationalization](https://guides.rubyonrails.org/i18n.html#pluralization) in Ruby or YAML and [Android String Resources](https://developer.android.com/guide/topics/resources/string-resource.html#Plurals) in XML. @@ -90,7 +365,13 @@ After the resource file gets parsed as XML, the Android resource compiler requir [does additional whitespace collapsing and Android escaping](https://developer.android.com/guide/topics/resources/string-resource#escaping_quotes), requiring the entire text node string to be wrapped in double quotation marks `"..."` to preserve the initial whitespace, or the inital whitespace to use Android escaping (`\u0032 \u0032 ...`). -## Use-Cases +
+ +### Extra Info: Use Cases + +
+ +#### Summary of General MF Behavior Most messages in any localization system do not contain any expressions, statements or variants. @@ -118,6 +399,8 @@ according to its plural category So, in American English, the formatter might need to choose between formatting `You have 1 kilometer to go` and `You have 2 kilometers to go`. +#### Leading and Trailing Whitespace + Rarely, messages need to include leading or trailing whitespace due to e.g. how they will be concatenated with other text, or as a result of being segmented from some larger volume of text. @@ -133,18 +416,22 @@ that is not an indicator of their significance. There are valid use cases for leading or trailing whitespace in a message that are not internationalization bugs. This means that it is not MF2's concern to enforce/discourage their usage. ---- - Developers who have messages that include leading or trailing whitespace want to ensure that this whitespace is included in the translatable text portion of the message. Which whitespace characters are displayed at runtime should not be surprising. -## Requirements +
-It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things. +### Extra Info: Requirements
+ +#### Design Around Simplicity/Correctness + +It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things. + +
APIs should be easy to use and hard to misuse. It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things. (per Joshua Bloch) @@ -159,7 +446,8 @@ the extent that we make it easy to get into trouble we fail. —Rico Mariani, MS Research MindSwap Oct 2003. (restated by Brad Adams, MS CLR and .Net team cofounder)
-
+ +#### Balance Between Legibility and Nesting Levels Developers and translators should be able to read and write the syntax easily in a text editor. @@ -178,6 +466,8 @@ if (foo) print "{{{This is translatable}}}" else print "{{{This is NOT translata if (foo) if (bar) switch (baz) case 1: print "{{{This is translatable, deep}}}" break; default: print "{{{This is NOT translatable, deep}}}" ``` +#### Ease, Escaping, Reserved Syntax, Whitespace + As MessageFormat 2 will be at best a secondary language to all its authors and editors, it should conform to user expectations and require as little learning as possible. @@ -198,8 +488,13 @@ However, we want WYSIWYG behavior as much as possible in patterns, meaning that between the pattern and its interpolated output, and that there is minimal ambiguity. This avoids chances for unwanted surprises between the message authoring time expectations and the actual runtime formatted results. +
-## Constraints +### Extra Info: Constraints + +
+ +#### Current Syntax Keywords & Values Limiting the range of characters that need to be escaped in plain text is important. @@ -211,6 +506,8 @@ such as `:number`, `$var`, `|literal|`, `+bold`, `-bold`, and possibly `@attr`. The current syntax supports unquoted literal values as operands. +#### Message Representation When Embedding in Container Format + Messages themselves are "simple strings" and must be considered to be WYSIWYG. The WYSIWYG nature of representing a message pattern is independent of whether the message is a single line or contains multiple lines. @@ -226,151 +523,4 @@ characters (including line breaks, control characters, etc.) and rely upon escap in those outer formats to aid human comprehension (e.g., depending upon container format, a U+000A LINE FEED might be represented as `\n`, `\012`, `\x0A`, `\u000A`, `\U0000000A`, ` `, ` `, `%0A`, ``, or something else entirely). - -## Simple Messages - -In the Subcommittee meetings following Github discussions on Issues #493 and #499, -the general consensus that formed for simple messages -is that we allow them to be unquoted. - -("Simple messages" refers to messages consisting solely of a pattern, and thus are not complex messages.) - -Because the simple message pattern consists of the entire message, -the pattern includes any leading or trailing whitespace. - -Given simple messages already being decided at a high level, -the design decisions below for the proposed and alternative designs pertain specifically to complex messages. - -## Proposed Design - -### Start in text, encapsulate message, always quote patterns - -Description: - -Since simple messages are unquoted (starting in text mode), -complex messages must also start in text mode. - -Within a complex message, patterns are always quoted with `{{...}}` or other choice of delimiter. - -The entire complex message is also wrapped with `{{...}}` or other choice of delimiter. -This allows interior "code mode" of message to have flexible whitespace in between tokens -and _around_ quoted patterns. - -Pros - -* The rule about the whether leading and trailing whitespace is included is simple and unambiguous. -* This matches the WYSIWIG behavior that simple messages preserve. -* The patterns can be detected within the pattern more easily due to the delimiters serving as a visual anchor. -* Requiring all patterns to be quoted minimizes the number of characters that need to be escaped within a pattern to 3: -the 2 pattern delimiter characters and the escape character itself. -* Because the sum of counts of declarations + `match` statement + `when` statements is always -greater than or equal to the number of patterns, -wrapping the entire message once yields less visual noise of repetitive code mode introducer symbols -when there is 1+ declarations in a `match` (selection) message, -or when there are 2+ declarations in a non-`match` complex message. - -Cons: - -* This comes at the cost of an inconsistency in the WYSIWYG patterns are quoted between simple and complex messages. -In the case of simple messages, the containing format itself implicitly defines the beginning and end of the pattern (example: `"..."`), which is not visible at the level of MF2, -while complex messages use the aforementioned delimiter to quote patterns (ex: `{{...}}`). -* Another potential drawback, specifically in the case of non-`match` complex messages with exactly 1 declaration, -is that this option adds 2 extra delimiters compared to an alternative syntax that doesn't require quoted patterns -and is designed to minimize delimiter usage only to code mode introducers. -* If we use curlies for patterns and for placeholders, then they serve double duty, which may make the syntax harder to understand, and also harder to make the pattern out visually. - -Evaluation: - -The pros outweigh the cons, not just in cardinality, but far more importantly, according to the relative weight -our value system places to the requirements met by the pro aspects compared to the con aspects. Namely: - -* [high] Unsurprising WYSIWYG behavior from patterns -* [high] Easy recognition of patterns, even for non-developers -* [high] A minimal number of characters requiring escaping -* [high] No limitations on users with valid non-i18n concerns -* [med] Flexible whitespace outside of patterns -* [low] Number of characters typed (probably comparable with alternatives anyways) -* [low] Number of "mode levels" from a parser perspective - - -## Alternatives Considered - -### Start in text, encapsulate code, trim around statements - -Allow for message patterns to not be quoted. - -Encapsulate with `{…}` or otherwise distinguishing statements from -the primarily unquoted translatable message contents. - -For messages with multiple variants, -separate the variants using `when` statements. - -Trim whitespace between and around statements such as `input` and `when`, -but do not otherwise trim any leading or trailing whitespace from a message. -This allows for whitespace such as spaces and newlines to be used outside patterns -to make a message more readable. - -Allow for a pattern to be `{{…}}` quoted -such that it preserves its leading and/or trailing whitespace -even when preceded or followed by statements. - -### Start in code, encapsulate text - -This approach treats messages as something like a resource format for pattern values. -Keywords are declared directly at the top level of a message, -and patterns are always surrounded by `{{…}}` or some other delimiters. - -Whitespace in patterns is never trimmed. - -The `{{…}}` are required for all messages, -including ones that only consist of text. -Delimiters of the resource format are required in addition to this, -so messages may appear wrapped as e.g. `"{{…}}"`. - -This option is not chosen due to adding an excessive -quoting burden on all messages. - -### Start in text, encapsulate code, trim minimally - -This is the same as the proposed design, -but with a different trimming rule: - -- Trim all spaces before and between declarations. -- For single-variant messages, trim one newline after the last declaration. -- For multivariant messages, - trim one space after a `when` statement and - one newline followed by any spaces before a subsequent `when` statement. - -This option is not chosen due to the quoting being too magical. -Even though this allows for all patterns with whitespace to not need quotes, -the cost in complexity is too great. - -### Start in text, encapsulate code, trim maximally - -This is the same as the proposed design, -but with a different trimming rule: - -- Trim all leading and trailing whitespace for each pattern. - -Expressing the trimming on patterns rather than statements -means that leading and trailing spaces are also trimmed from simple messages. -This option is not chosen due to this being somewhat surprising, -especially when messages are embedded in host formats that have predefined means -of escaping and/or trimming leading and trailing spaces from a value. - -### Start in text, encapsulate code, do not trim - -This is the same as the proposed design, -but with two simplifications: - -- No whitespace is ever trimmed. -- Quoting a pattern with `{{…}}` is dropped as unnecessary. - -With these changes, -all whitespace would need to be explicitly within the "code" part of the syntax, -and patterns could never be separated from statements -without adding whitespace to the pattern. - -## Scoring matrix - -TBD +
\ No newline at end of file From 5e9b4732943b0bff6644a56966cad866b57059f0 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Mon, 30 Oct 2023 07:38:44 -0700 Subject: [PATCH 18/19] Useful addition Co-authored-by: Eemeli Aro --- exploration/text-vs-code.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index ca0bd047c..088c875cd 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -157,6 +157,19 @@ There are five candidates for handling the boundaries between code and patterns: ### Always Quote +``` +{{ +match {$var} +when 0 {{This has no space in front of it.}} +when one + {{This has no space or newline in front of it.}} +when few + {{ This has one space at the start and the end. }} +when many {{ This also has one space start and end. }} +when * {{You must quote all variant patterns.}} +}} +``` + Pros: - The boundary between pattern and code is always clear. - The quoting reduces the number of in-pattern escapes to the open/close sequence. From fe5852c0846762d4f95454ac9295ea27fb97d6d8 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Mon, 30 Oct 2023 07:50:15 -0700 Subject: [PATCH 19/19] Update exploration/text-vs-code.md Co-authored-by: Eemeli Aro --- exploration/text-vs-code.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exploration/text-vs-code.md b/exploration/text-vs-code.md index 088c875cd..d0894ed0e 100644 --- a/exploration/text-vs-code.md +++ b/exploration/text-vs-code.md @@ -282,7 +282,7 @@ directly after code: > > This has a newline and a space at the start and a space-newline at the end > ->>{when *}{| +>{when *}{| >|} You can quote the newlines and spaces should you desire {| >|} >```