Data Model Q&A #181

romulocintra · 2021-06-21T18:14:06Z

romulocintra
Jun 21, 2021
Maintainer

Q&A about Data Model

eemeli
Jul 2, 2021
Maintainer

As discussed at the last WG meeting, some disagreement exists regarding the expressive powers of the candidate data models. My assertion is that the model proposed by @zbraniecki and myself is able to represent all messages, while the model proposed by @echeran and @mihnita is not. To demonstrate that, please find below a few messages that I assert are not directly expressible in the more limited data model.

I have attempted to prune down these examples to their simplest form, and as such they could certainly be individually expressed differently to achieve their specific goals. However, my contention is that the specific message and variable shapes used here cannot be used in the "EM" data model, and would require at least one of the following:

A change in the variable shapes, e.g. in msg_2 using a { start, end } range value as the input, rather than two separate inputs, or in msg_5 including the "msg_group" and "two" parts in the $var_this_path value.
A change in the message structure, e.g. flattening some or all of the intermediate groupings used messages 4-6, e.g. such that the keys would be of a form like "msg_group.foo.this.bar.two" rather than actually grouped, or by using a selector to artificially combine some of the messages into one "message".
Custom formatting functions that have unlimited access to the runtime context and the message resources. For instance, in msg_4 the MESSAGE() function could be passed a single input "msg_group.foo.$var_this.bar.$var_two" that it would internally parse by splitting it at . characters, and then consider values starting with $ to refer to context values.

For syntax, the examples use a Fluent-ish notation that's hopefully clear enough about its intent. A few formatting functions are used:

MESSAGE(...path) flattens its inputs into an array of strings, and determines the formatted value of the message at that path in the current resource. If required by the surrounding context, said message value is stringified.
DATETIME_RANGE(start, end, ...options) takes two datetime values and formats them as a date range. Non-datetime inputs are parsed into datetime values. In JS, this corresponds to new Intl.DateTimeFormat(locale, options).formatRange(start, end).
NOW() returns the current date.

Apologies for the delay in getting this out, as I'm aware that our next "extended" meeting is already on this coming Monday. If it's of use, I'd be happy to also implement these with the initial implementation of the "EZ" model that's available here: https://github.com/messageformat/messageformat/tree/mf2/packages/messageformat

Presuming the following context values:

$var_start = Date(1970-06-01)
$var_end = Date(2021-06-01)

each of msg_1, msg_2, and msg_3 should format as "Years 1970 - 2021".

msg_start = 1970

msg_1 = Years { DATETIME_RANGE(MESSAGE("msg_start"), $var_end, year: "numeric") }
msg_2 = Years { DATETIME_RANGE($var_start, $var_end, year: "numeric") }
msg_3 = Years { DATETIME_RANGE(MESSAGE($var_name), NOW(), year: "numeric") }

Presuming the following context values:

$var_this = "this"
$var_two = "two"
$var_this_path = ["foo", "this", "bar"]

each of msg_4, msg_5, and msg_6 should format at "Category Two".

msg_group {
  foo {
    this {
      bar {
        one = One
        two = Two
        three = Three
      }
    }
    that {
      bar {
        one = One Other
        two = Two Other
        three = Three Other
        some = Some Other
      }
    }
  }
  default_foo = this
}

msg_4 = Category { MESSAGE("msg_group", "foo", $var_this, "bar", $var_two) }
msg_5 = Category { MESSAGE("msg_group", $var_this_path, $var_two) }
msg_6 = Category { MESSAGE("msg_group", "foo", MESSAGE("msg_group", "default_foo"), "bar", $var_two) }

0 replies

mihnita · 2021-07-02T18:54:55Z

mihnita
Jul 2, 2021
Maintainer

First case does not include anything "tree-like", and (most) of it is as easy to represent in the EM model as in the EZ model.

The one difference between passing arguments to functions is the the EZ model relies on the order of parameters.
The EM model uses a map. So it is technically named parameters.
Which are a lot more flexible, more informative, and easier to refactor / extend without breaking things.

DATETIME_RANGE expects start and end as first 2 parameters in the EM model.
What if I want to pass some kind of a Range object?

In the EM model all parameters are named.
So the function can take any of these, no problem:
{ start: $var_start, end: $var_en, ...}
{ start: $var_start, end: NOW, ...}
{ range: $someRefToSomeRange, ...}

Both examples show the kind of bad practices that the EM model tries to prevent.

Freely mixing messages everywhere will break things.
And completely ignoring any kind of type safety.

In the first case the year (localizable) is used as input for the DATETIME_RANGE.
Which would normally want two kind of date/time -like "things" (object, long timestamps, stuff from java.time or js temporal)
Now, I translate msg_start to "azi" ("today" in Romanian"), or "૧૯૭૦" (1970 with Gujarati digits)

What will DATETIME_RANGE do with that?

Second example makes default_foo translatable, and uses it as part of the path to resolve a MESSAGE later on.
And everybody will (of course) translate that.
What happens when default_foo = acesta (Romanian translation for "this", masculine form ), or maybe this default_foo = {item_gender, gender, feminine{aceasta} masculine{acesta} other {ace(a)sta}}
Because (of course) this smart tooling that gives flexibility to translators is a good thing.

BUT, as an answer to "EM can't do it": you can ABSOLUTELY do whatever you want with a bit of coding.

I can write and register an UNSAFE_DATETIME_RANGE taking { start: "msg:msg_start", end: "var:var_end", ... }
That function is free to load the message with the ID msg_start using a resource manager (there is no need to be in the same file) and explode (or not), the same way the DATETIME_RANGE does.

Similar for MESSAGE. The default would take a message ID that can be passed "as is" to a resource manager for loading.

But you can write and register an UNSAFE_MESSAGE with the parameter being { path: ["msg_group", "foo", "msg_group", "msg:default_foo", "bar", "var:var_two" ]... }.
Or flattened in a string: "/msg_group/foo/msg_group/msg:default_foo/bar/var:var_two"

And that can explode or succeed the same way the default one does in the EZ model.

Heck, one can write and register a JS_EVAL function that takes a {script: 'document.alert($msg:msgId)}

But you have to explicitly write and register and use that kind of unsafe functionality.

One can argue: but Mihai, $foo is a "real variable reference", "var:foo" is not, is a string.

True... until you try to implement it.

Since variables are not available by name at runtime in most programming languages (not even in JS, because of obfuscation / minimization) you still end up with a map of variables with string keys containing the pre-compilation variable names.

That is what the EZ rust implementation does. Variable and function references are in fact strings there too, used to lookup the real in thing in tables.
Nothing wrong with it. But we should not pretend it is real typing, and somehow safer.

I also fine and open to create explicit VariableRef and MessageRef types, which pretend to be types, but they would only be thing wrappers around string IDs. Would be a varRef{ id: "foo" } instead of "var:foo"

That is not the core problem.

The problem is the free "mix and match" tree-like things in a path.

Why would I allow BY DESIGN a message reference (the message translated, possibly containing selectors, placeholders, what not) as part of a path to a variable, or another message?

Any developer who ever had to deal with translations knows better than to put non-translatable stuff in localization files, and then directly use the result at runtime to reference other things.

Question, to settle the "it can't be done" argument: should I write, register, and use those 2 unsafe functions to show it can be done?

Or my description above is enough?

They would to the same unsafe operations that DATETIME_RANGE and MESSAGE do in the EZ implementation, nothing special about it.

If the argument is: but writing these functions is ugly, and dangerous, basically a dangerous EVAL?
Then my answer is: isn't that kind of implementation also ugly and dangerous in the EZ model, parameters of type Any,
ordered parameters (that can change at any time, unlike named parameters)?

The only difference is that the EZ model has it available by default, you don't need to work extra to shoot yourself in the foot.

1 reply

eemeli Jul 2, 2021
Maintainer

Question, to settle the "it can't be done" argument: should I write, register, and use those 2 unsafe functions to show it can be done?

I think what would be best would be providing an EM data model representation of the above messages, using whatever notation you feel comfortable with. If you need to change the function signatures or introduce additional functions, then I'm sure that a short description of their behaviour should be sufficient.

For example, this is how a resource holding the first three messages would look like with the EZ data model:

{
  id: 'res',
  locale: 'en',
  entries: {
    msg_start: { value: ['1970'] },
    msg_1: {
      value: [
        'Years ',
        {
          func: 'DATETIME_RANGE',
          args: [{ msg_path: ['msg_start'] }, { var_path: ['var_end'] }],
          options: { year: 'numeric' }
        }
      ]
    },
    msg_2: {
      value: [
        'Years ',
        {
          func: 'DATETIME_RANGE',
          args: [{ var_path: ['var_start'] }, { var_path: ['var_end'] }],
          options: { year: 'numeric' }
        }
      ]
    },
    msg_3: {
      value: [
        'Years ',
        {
          func: 'DATETIME_RANGE',
          args: [
            { msg_path: [{ var_path: ['var_name'] }] },
            { func: 'NOW', args: [] }
          ],
          options: { year: 'numeric' }
        }
      ]
    }
  }
}

mihnita · 2021-07-11T01:24:32Z

mihnita
Jul 11, 2021
Maintainer

722491f

2 replies

eemeli Jul 11, 2021
Maintainer

@mihnita Please correct me if I got something wrong, but as far as I can tell this should be a pure-data representation of your response for the first three messages, included here to allow for easier comparison of the models.

{
  msg_start: { id: 'msg_start', locale: 'en', parts: ['1970'] },
  msg_1: {
    id: 'msg_1',
    locale: 'en',
    parts: [
      'Years ',
      {
        name: 'range',
        formatter_name: 'DATETIME_RANGE',
        options: { start: 'm:msg_start', end: '$var_end', skeleton: 'y' }
      }
    ]
  },
  msg_2: {
    id: 'msg_2',
    locale: 'en',
    parts: [
      'Years ',
      {
        name: 'range',
        formatter_name: 'DATETIME_RANGE',
        options: { start: '$var_start', end: '$var_end', skeleton: 'y' }
      }
    ]
  },
  msg_3: {
    id: 'msg_3',
    locale: 'en',
    parts: [
      'Years ',
      {
        name: 'range',
        formatter_name: 'DATETIME_RANGE',
        options: { start: '$var_name', end: 'f:NOW', skeleton: 'y' }
      }
    ]
  }
}

eemeli Jul 11, 2021
Maintainer

Now, for some commentary:

I'm honestly fascinated by the '$var_name', 'm:msg_start' and 'f:NOW' variable, message and function reference representations, as they're revealing a misunderstanding that I at least had had with the EM model. The EZ model can consider option values to be static opaque blobs, but this is clearly not the case with EM. Do you have definitions available somewhere for the f:, m: and $ prefixes? A pretty crucial consideration here is of course whether they allow for any options or arguments to be passed as well, i.e. something like 'f:FOO(x: "bar", y: $var)'.
The lazy and recursive resolution of these option values means that it's possible e.g. for a runtime variable to pass in a message reference as its value, which is then resolved during the original message's resolution. This isn't actually possible with the EZ model's proposed execution model, where the calling context needs to separately resolve it first. Is the desire to do this the reason why you've opted for a stringish representation of these references, as opposed to a fully resolved representation?
For the remaining messages, it might be enough if you could sketch out the options shapes for the msg_5 and msg_6 top-level message references? Or perhaps just confirm that these are correct? I suppose the main concerns here are whether I've understood right how you'd represent a variable number of arguments, and how the m: message references would be applied to messages that are in a sub-group.
```
options_5 = { 0: 'msg_group', 1: '$var_this_path', 2: '$var_two' }
options_6 = { 0: 'msg_group', 1: 'foo', 2: 'm:msg_group.default_foo', 3: 'bar', 4: '$var_two' }
```

I shall continue to ponder more on this, and look forward to tomorrow's live discussions.

mihnita · 2021-07-12T21:24:58Z

mihnita
Jul 12, 2021
Maintainer

Answer to 1

The "f:", "m:" "$" are not documented anywhere, and they are opaque.
They are conventions created and understood by the DATETIME_RANGE
As you can see, the commit did not touch anything in the src part, which is the core stuff (which would be in a standard library, thing ICU, or ECMAScript). They only touch the test class, which is supposed to represent developer extensions.

A company might decide that this is a company wide convention, add some helper functions to make that easier.

They are not first class citizens, because I think they are bad ideas. With maybe limited and rare utility, but with the dangers outweighing the usefulness.

But! If in times they prove to be good ideas, we register a function that can eval that kind of string, and it becomes standard.
So "m:" and "f:" and "$" (and who knows what more) become an extension to the standard, at that time.

For a simple, cleaner approach I would define different parameters to DATETIME_RANGE, with better typing:
start: takes the starting time as a long, epoch time, milliseconds from X
start_string: takes the starting time as a string in ISO 8601 format
start_msg_ref: takes the starting time as a message reference
start_var_ref: takes the starting time as a variable reference

(Stas' idea)

And since these are developer written functions that we can't control, then yes, they can do f:FOO(x: "bar", y: $var), or even put some full JavaScript in there.
See my previous comment: "Heck, one can write and register a JS_EVAL function that takes a {script: 'document.alert($msg:msgId)}"

Answer to 2

I don't think I understand this.
In a previous discussion I remember you saying that in the EZ model values (pointed at by a var ref) are of type Any.
This means they can also be message references.
So EZ can also have "a runtime variable to pass in a message reference as its value"

Answer to 3

Since one gets to write custom functions, they can write functions that can take anything they like:

options_5 = { 0: 'msg_group', 1: '$var_this_path', 2: '$var_two' }

or

options_5 = [  'msg_group', '$var_this_path', '$var_two' ]

or

options_5 = 'msg_group/$var_this_path/$var_two'

I don't think it is inherently wrong to have the "path" be a string instead of an array of items.
A path part in a URL is implemented as a string in a lot of programming languages.
A select in XPath is a string, and it is used exactly to point to an item / items in XML, same as messages.
Similar for file systems, all of them have a natural mapping to a string, even if they point to a tree folder structure.

It means that one can put in there a string-eval kind of thing (and many do): cat /messages/msg_$msgid.txt
(which can't be done in the EZ model, at least not "fully resolved")

If for example 10 years from now the community decides that a well defined "eval" kind of functionality is needed, then all you need to do is register a new function in the central registry, properly documented, and it becomes as standard as the stuff that was there in day one.

With the EZ model we have to say:
this works, and it is first class citizen: /messages/$msgid
but this does not work, and you have to do something custom (what?): /messages/msg_$msgid

0 replies

mihnita · 2021-07-12T21:50:44Z

mihnita
Jul 12, 2021
Maintainer

Backward compat

There is something that I think I didn't wrap my head around properly during the meeting.
(or I did, but the wrong way)

There is always a risk to break backward compat with non standard functions in both models.
But it does not "break the standard" (we don't need to change the standard to accommodate for it).

Company A can define the DATE_RANGE function expecting 3 parameters: skeleton, start, end
Company B can define the DATE_RANGE function expecting more parameters: start, end, year, month, day, hour, ... (think ECMA)

Now company A buys company B. Conflict, problems.
(btw, named parameters / map is more flexible in this case, you only need to modify the (new) company wide implementation of DATE_RANGE and support both styles)

Same in the html world, I can write a custom component, call it <copyright>, and use it, with some custom attributes.
And HTML 7 decides to implement something in the standard and call it the same, but with different behavior and attributes.

It happens all the time when you allow developers the flexibility to extend standard: PUA in Unicode, BCP 47
("subtags in the range 'qaa' through 'qtz' are reserved for private use in language tags", "script subtags 'Qaaa' through 'Qabx' are reserved for private use", region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are reserved for private use in language tags, the extension mechanism, some of if reserved for the standard itself, some for private use (-x-)
https://www.rfc-editor.org/rfc/rfc4646.txt

We can easily say: all functions that start with _priv are private use, and will never be part of the official registry.
And that would eliminate a big chunk of the problems (and below that best known naming practices apply, same as in libraries: company / library special prefixes / namespaces, etc).

0 replies

mihnita · 2021-07-12T22:00:14Z

mihnita
Jul 12, 2021
Maintainer

Thoughts about string / paths / trees

Don't take this as an argument pro / cons.
Look at it as an explanation why I went with strings.

The only "natural" place to have a path-like identifier (to access a tree) would be for messages.

The EZ model only has array of Parts in paths for message ref and variable ref, but not function refs.
Why?

Programming languages don't have anything like this for variables except namespaces, but those apply to variables AND functions.
And they are little more than syntactic sugar, there is no big difference between com.mihai.test.i18n.Foo and com.mihai.test_i18n.Foo.
The visibility is the same, nothing breaks if I do this. The only area where it can break thins is if you use reflection. The fact that i18n is a sub-space to test in the first case does not have any practical implications.

When implementing MF2 the variable names (as they are in the source) are not available, so the dev has to put them in some kind of map anyway.

Bot the EM and EZ implementations use a flat map, with the key a string.
So to access a variable at runtime you need to flatten the variable reference path to a string anyway.
(or store variables in a map-of-maps, which is probably overkill)

And by "flattening" we add another problem with the EZ model: varRef( path: ["foo", "bar", "baz"] and varRef( path: ["foo/bar", "baz"] flatten to the same thing if the "separator" for flattening is "/"

Using something other than "/" does not help: I can put in my plain text Part the string used for separator when flattening, whatever that is (as long as the key of the runtime map of vars is a string)

Unless we say "no, when you flatten you must escape the separator", so ["foo", "bar", "baz"]becomes"foo/bar/baz"and["foo/bar", "baz"]becomes["foo/bar/baz"]`, and they don't overlap.

And since one of the parts is a message I can in theory have anything in there: Unicode characters and what not. And they become part of the "path to a variable". So some escaping might actually be required anyway, if we accept messages as part of a path.

But now we made this so much more clunky... and "fully resolved" (array of parts) is actually clunkier than a string.

TLDR:

There is no real benefit of a "array of things" as path in references.

variables don't naturally NEED it. Namespaces were added to programming languages when you needed A LOT of variables, with bad encapsulation, like C++. But for localization we are talking thousands of variables accessed at runtime, so the need is not that high.
the function references don't do that in the EZ model l(why not?)
for messages a tree-like structure it might be "nice to have", but see below

There is no current file format designed for localization that supports nested structures to store messages.

JSON/ XML / YAML don't count, they are not designed for localization, they were designed for something else, and people thought "hey, wouldn't that be nice? I already know how to parse this". But major platforms lived with flat messages for many-many years, and nobody had a problem. So even messages "can live without" (maybe with some "fake namespaces", like "." or "/" in the ID). Look at gettext, MacOS strings, Windows .rc files, Java and .NET properties, Android strings.xml, etc.

Even with a natural container for messages (json / xml) the problems are bigger with trees: if I "merge" the strings for my app with the strings from 10 libraries, it is easier to merge a "flat catalog of messages" than 10 full trees.
And loading "foo/bar/baz" from a flat map is easier to implement (cleaner code) and more performant than loading it from:

foo : {
  bar : {
    baz: "the message"
  }
}

We know the implications, because a lot of people / companies, did it in the last 40-50 years, bot in localization, and in programming language design.

So for me this was YAGNI ("You aren't gonna need it"). And if we need it, we can easily "fake" it.

0 replies

stasm · 2021-07-16T16:12:40Z

stasm
Jul 16, 2021
Maintainer

I'd like to share a realization that helped me understand the crux of the discussion (or so I think).

We set out to test if there are messages that can't be represented in any of the proposed data models. We didn't define represent in the data model, however. As it turned out, the EZ model leans towards expressing complexity in the data model itself (through nested, composable AST nodes), while the EM model actively avoids it by moving the complexity into the implementation (i.e. the execution model).

I hypothesize that this allows both models to express the same set of messages.

3 replies

stasm Jul 16, 2021
Maintainer

A good example approximating the two different approaches came up in a conversation after the Monday's extended meeting. It's based on HTML.

EZ

The EZ model would favor flexibility through composability (but still within some rules, e.g. you can't nest p in a p).

<p>
    <strong>H</strong>ello, world.
</p>

In the EZ model, it's easy to:

Make a one-off deviation in a long list of similar messages.
Capture complexity by composing and reusing elements in novel ways (by still within some basic rules).
Give power to translators to compose elements as they need to.

EM

The EM model, on the other hand, would instead prefer encapsulation through abstraction, with paragraph-with-initial defined in the registry by the standard, by Unicode extensions, or by the developer on a per-use-case basis.

<paragraph-with-initial>
    Hello, world.
</paragraph-with-initial>

In the EM model, it's easy to:

Make consistent changes en masse.
Write tooling for it because we can assume that elements don't nest.
Test the logic encapsulated by the elements.

stasm Jul 16, 2021
Maintainer

It's interesting to observe that given the above two approaches, it's possible to express EM in terms of EZ. In other words, EM can be thought of as a subset of EZ. If I understand correctly, this is where the idea of having a flexible model + restrictive validation and linting comes from. We could technically start with EM (designed as EZ + restrictions) and then allow to lift some restrictions in the future, if needed.

I think this is an interesting take which sounds like a good compromise, but I also strongly believe that we first need to agree on the compatibility strategy of the standard before we talk about lifting restrictions or extending the rules in the future.

grhoten Jul 19, 2021
Maintainer

EZ

<p>
    <strong>H</strong>ello, world.
</p>

EM

<paragraph-with-initial>
    Hello, world.
</paragraph-with-initial>

I have a preference for CSS style instead of dedicated tags. Our format actually has several transformations because it's important to transform variables.

We use <transform mode="mode">...</transform> where mode can be capitalize, capitalizeSentence, upperFirst, uppercase, lowercase, quote and so forth. The quotes are context sensitive. For example, Russian and Chinese languages vary the quotes depending on the script being used.

stasm · 2021-07-16T16:42:27Z

stasm
Jul 16, 2021
Maintainer

Coming back to the data model, here's how I imagine dynamic message references expressed in EZ and EM:

EZ

Message = Array<Part>;
Part = string | Expression;
Expression = string | VariableReference | MessageReference | FunctionReference | …;
VariableReference = {id: Expression, args: Map<string, Expression>};
MessageReference = {id: Expression, args: Map<string, Expression>};

msg1 = ["Hello, ", VariableReference {id: "userName"}]
msg2 = ["About ", MessageReference {id: "brandName", args: {case: "locative"}}]
msg3 = ["Recruit ", MessageReference {id: VariableReference {id: "selectedUnit"}, args: {case: "accusative"}}]

EM

Message = Array<Part>;
Part = string | Function;
Function = {name: string, args: Map<string, string | number>};

msg1 = ["Hello, ", Function {name: "VARIABLE", args: {id: "userName"}}]
msg2 = ["About ", Function {name: "MESSAGE", args: {id: "brandName", case: "locative"}}]
msg3 = ["Recruit ", Function {name: "DYNAMIC_MESSAGE", args: {id: "selectedUnit", case: "accusative"}}]

// Or overloading MESSAGE() with a new id_var parameter:
msg4 = ["Recruit ", Function {name: "MESSAGE", args: {id_var: "selectedUnit", case: "accusative"}}]

2 replies

stasm Jul 16, 2021
Maintainer

I see EZ as more elegant, almost to a point where it's pleasing from the language design point of view. Ultimately, however, I think it's too complex. I remember from Fluent that one of the major challenges was to design at which point in the execution model we'd want nested nodes to be stringified vs. passed around raw or as some kind of inspectable wrappers.

stasm Jul 16, 2021
Maintainer

Another thing that I like about the EM model in this particular example is that case is part of the MESSAGE's API, as opposed to being one of the arguments in the MessageReference node in EZ. This makes it much harder to change it and break existing translations that way.

eemeli · 2021-07-19T21:03:43Z

eemeli
Jul 19, 2021
Maintainer

At today's meeting, @mihnita made a significant clarification regarding the EM model, as the function reference name property should be considered to have the type string[] rather than string. Essentially, this means that during execution a mutable options bag may be passed to each of a sequence of functions in turn, with the output of all but the last being discarded.

With this approach, this would be a possible EM model representation of the first three messages I posted above:

{
  msg_start: { id: 'msg_start', locale: 'en', parts: ['1970'] },
  msg_1: {
    id: 'msg_1',
    locale: 'en',
    parts: [
      'Years ',
      {
        name: ['msg', 'var', 'range'],
        formatter_name: 'DATETIME_RANGE',
        options: {
          msg_ref: 'msg_start',
          msg_target: 'start',
          var_ref: 'var_end',
          var_target: 'end',
          skeleton: 'y'
        }
      }
    ]
  },
  msg_2: {
    id: 'msg_2',
    locale: 'en',
    parts: [
      'Years ',
      {
        name: ['var', 'var', 'range'],
        formatter_name: 'DATETIME_RANGE',
        options: {
          '0_var_ref': 'var_start',
          '0_var_target': 'start',
          '1_var_ref': 'var_end',
          '1_var_target': 'end',
          skeleton: 'y'
        }
      }
    ]
  },
  msg_3: {
    id: 'msg_3',
    locale: 'en',
    parts: [
      'Years ',
      {
        name: ['var', 'func', 'range'],
        formatter_name: 'DATETIME_RANGE',
        options: {
          var_ref: 'var_name',
          var_target: 'start',
          func_name: 'NOW',
          func_target: 'end',
          skeleton: 'y'
        }
      }
    ]
  }
}

Taking msg_1 as an example, the execution model for the function reference would then proceed as follows:

msg gets the options bag exactly as described above, but only cares about msg_ref and msg_target. It looks up the message msg_start and assigns is value to options.start.
var gets the options bag as modified by msg, but only cares about var_ref and var_target. It looks up the runtime variable var_end and assigns its value to options.end.
range gets the options bag as modified by both preceding functions, but only care about start, end, and skeleton. It looks up the message and variable values those contain, and formats them usign the year skeleton.

msg_2 is largely similar, with the extra detail of needing number prefixes on the options to differentiate between the repeated var call inputs. msg_3 introduces the func function, which hopefully is pretty obvious in its behaviour.

With this, I'm satisfied that the two models are indeed equally powerful in their abilities to represent messages. There are still strict differences between the models, but at least we should be able to represent all possible messages in either data model. Any message expressed in one data model can be mapped to the other.

This also means that we do not need to consider e.g. the XLIFF representation of MF2 messages or the translator's view of them when deciding between data models, as any representation achievable with one model may also be used with the other. The same goes for the syntax.

So all we're really left with as differentiating factors are 1) elegance and 2) the concerns of a programmer: How can we establish guarantees and confidence in correctness while writing code and during execution. Here, the two models do present different abilities and requirements, which we ought to consider in more depth.

0 replies

eemeli · 2021-08-04T11:47:22Z

eemeli
Aug 4, 2021
Maintainer

To help with comparisons between the models, I added a page to the wiki: Data & Execution Model Differences

As the title suggests, that expands a bit the scope from just the data model to also include the requirements each proposal puts on the execution or runtime behaviour. In particular, formatting functions are treated rather differently by the two proposals.

I invite anyone interested to add or amend the contents of the page, of course.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Model Q&A #181

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 10 comments 8 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

EZ

EM

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Data Model Q&A #181

romulocintra Jun 21, 2021 Maintainer

Replies: 10 comments · 8 replies

eemeli Jul 2, 2021 Maintainer

mihnita Jul 2, 2021 Maintainer

eemeli Jul 2, 2021 Maintainer

mihnita Jul 11, 2021 Maintainer

eemeli Jul 11, 2021 Maintainer

eemeli Jul 11, 2021 Maintainer

mihnita Jul 12, 2021 Maintainer

mihnita Jul 12, 2021 Maintainer

mihnita Jul 12, 2021 Maintainer

stasm Jul 16, 2021 Maintainer

stasm Jul 16, 2021 Maintainer

EZ

EM

stasm Jul 16, 2021 Maintainer

grhoten Jul 19, 2021 Maintainer

EZ

EM

stasm Jul 16, 2021 Maintainer

EZ

EM

stasm Jul 16, 2021 Maintainer

stasm Jul 16, 2021 Maintainer

eemeli Jul 19, 2021 Maintainer

eemeli Aug 4, 2021 Maintainer

romulocintra
Jun 21, 2021
Maintainer

Replies: 10 comments 8 replies

eemeli
Jul 2, 2021
Maintainer

mihnita
Jul 2, 2021
Maintainer

eemeli Jul 2, 2021
Maintainer

mihnita
Jul 11, 2021
Maintainer

eemeli Jul 11, 2021
Maintainer

eemeli Jul 11, 2021
Maintainer

mihnita
Jul 12, 2021
Maintainer

mihnita
Jul 12, 2021
Maintainer

mihnita
Jul 12, 2021
Maintainer

stasm
Jul 16, 2021
Maintainer

stasm Jul 16, 2021
Maintainer

stasm Jul 16, 2021
Maintainer

grhoten Jul 19, 2021
Maintainer

stasm
Jul 16, 2021
Maintainer

stasm Jul 16, 2021
Maintainer

stasm Jul 16, 2021
Maintainer

eemeli
Jul 19, 2021
Maintainer

eemeli
Aug 4, 2021
Maintainer