-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match expressions should be re-usable in placeholders #736
Comments
Why do you assume that the
No. We presume it does because the The place that would be confusing would be a message like this:
The addition of numeric replacements at this point strikes me as superfluous and massively confusing (given MF1's use of positional arguments, about which much ink was spilled). |
An alternative solution would be: just as the expression that appears as a selector in a If we changed that, then your second example:
would be an error ("Missing Formatter Annotation" or something like that). I realize this has probably been considered before. However, I think it's preferable to introducing positional arguments (even in a limited way). Alternately, I can imagine this being addressed by a separate linter. |
Because we never say that it has any such effect. I could imagine such an approach providing a workable solution for the core problem, though. According to our current language, selector and placeholder expressions do not modify their operand beyond themselves. If we want to make selectors work differently, we'll need to add spec language to that effect.
Actually, atm all those examples only look like they work. Which is a problem. |
@eemeli noted:
I can't find where we do that? The closest I can find is this statement in
It doesn't say that it isn't later used for formatting. Obviously, I've already written an example of how this could be confusing (multiple selectors on the same variable) up above. But I think this is an important shorthand feature to provide. @catamorphism noted:
We could do that, although we also permit non-annotation there and allow implementations to supply the annotation. I think that's an important feature. We need to think of message authors here, I think. |
It's in Variable Resolution: message-format-wg/spec/formatting.md Lines 187 to 191 in 7a10ee2
The |
I don't buy that argument. The text you quote says how to determine what the operand's value is (it was either passed in or declared). The spec just does not say, one way or the other, whether the selector has any side effects on later placeholders. There are certainly implications (in both directions). Ultimately, this speaks to #645 being something we need to work on. |
Let's say you're right, and our current text does allow for a
you would have an expectation of a formatting call with In order for that to happen, the resolved value of the selector expression needs to be put somewhere so that resolving and formatting the placeholder picks up on it. Recalling the part that I quoted above, and again noting that Further, considering that a message like
really should format as So to make all that happen, custom function implementations need to be called with:
I don't think this is reasonable, or "simple and minimal" as we require in message-format-wg/spec/formatting.md Lines 294 to 296 in 7a10ee2
Or is there some other mechanism that I'm missing that our spec enables for allowing this behaviour? |
@eemeli asked:
Yes. Looking at that message, wouldn't that be your expectation? After all, we just did the selection on that format. We previously argued that we would require annotation in order for selection to happen (vs. allowing inference in placeholders).
I am not sure about your example here. The second I still maintain that its easier on message writers (our largest audience) if there is a clearly defined behavior that allows them to write minimal messages in which the annotation on
I think function implementations need to be able to annotate the formatting context (otherwise I am not sure the syntax position is necessary, although the data model does communicate this. The function should only need to know what its inputs (operands and options) are. It's the MF processor that knows about the function's position. Modifying your example:
The Anyway, I think, overall, you have identified a problem with the spec. We really need to work through #645. |
I agree, something like this ought to be possible. The core issue here is that this isn't supported by our spec atm. My first thought was to enable this via For example, consider this message, which presupposes custom
If the resolved value of the selector is assigned to
but that's potentially breaking our current invariant of having only one meaning for
Or would we make references to My point here being, assigning declaratory powers to
Not quite. Declarations are handled by the Variable Resolution bit that I quoted earlier: message-format-wg/spec/formatting.md Lines 187 to 191 in 7a10ee2
Effectively, declarations together provide a value mapping that we check first, before looking in the formatting context's input mapping. This is also a key part of what allows for lazy evaluation, as the above quote is the only path through which we look at any of the declarations during Pattern Selection or Formatting. If we do want
I'm pretty sure that this is a separate issue from what #645 is looking to resolve. In large part, that's about refactoring the definition of "resolved value"; this is about |
I agree that this is not directly related to #645. Also consider:
where This is perhaps a contrived example, but the point is that custom functions can return anything, and always treating
If This shows how treating Another alternative to Eemeli's solution to this problem would be to change the syntax to introduce names along with the keys, where each variant binds a name to each of the selector expressions:
This is similar to how |
In general the idea of formatting a variable without a function specified was that the formatting function is determined by the type of the variable. This is something that was strongly argued from very early on by Zibi and I think Stas, and also by ICU people. But this example is really confusing, and I don't think we should inherit the selector for formatting. Because these are different things. As I argued many times, the selection / formatting functions implement different interfaces. The Think about a list:
We don't expect TLDR: so no, I don't think it should be inherited. |
As someone still catching up on the new syntax/spec, but working daily in MF1 with many devs and their misconceptions about it, I can see selectors mutating later placeholder usage as causing far more confusion than convenience. I think most efforts to make messages shorter are going to cause long-term confusion in usage. That is, of course, separate from reducing verbosity (e.g. dropping
Is it possible to give selectors some additional shorthand syntax for declarations?
That would give devs flexibility/convenience in the "setup" without introducing a sort of magical, and fragile, concept for patterns. Otherwise, I'd be perfectly happy with the explicit version:
|
One way of expressing the root issue here is that currently, the simplest way of expressing a message does not match with what ought to be used for the right formatting. So could we change that simplest message expression to match the right results, either by changing how we process a The explicit assignment within
In other words, if
|
I think I do prefer that. It's not as short, but again, I don't think that's the most important factor. It's very clear, and reducing multiple paths to the same result wherever reasonable and avoiding "magical" conveniences will have a big impact on getting valid messages in and keeping them valid through the whole process. |
Don't hesitate to shut me down if this has already been hashed out, but I'm also now wondering if the root of the confusion stems from expressions being sometimes mutating (inputs) and other times not (patterns). Does it make sense for inputs to either look more like they're just coercive redeclarations?
or perhaps avoid expressions altogether and have a sort of basic casting syntax:
Where anything more complex would require a new
|
The current declaration syntax is indeed the result of an extended "hashing out", so it might be best not to reopen that unless there's some very clear reason (see this design doc for some of the history). While it's related to the current issue (much like the "resolved value" discussion of #645), I don't see it providing a resolution to what's happening with the |
This only looks like "the right X" because we decided that If we think of it this way:
the "dissonance" goes away. Because Note: this is a discussion we had and arguments I made before (that we are confusing concepts, and will confuse people). |
Using a different function for selection does not change the way that leaving out the formatters looks like it's right:
Here, if |
What are you considering worse about this? Because some might have the impression that the selection annotation would do the work for them? Edit: I understand why "1.0 apples" is bad, but it's the same result as the unannotated simple pattern, no? |
(as chair) Discussion of number selection using the same function as the formatter is a WG consensus and is documented in Selection on Numerical Values in exploration. Discussion of this is closed in the LDML45 timeframe. We welcome feedback on user's lived experience with this as part of the tech preview. (as contributor) @eemeli's original issue is whether/how a |
Let's compare this to programming languages: We don't expect that the function in
We expect and it is natural that
|
I didn't post this earlier but it seems perhaps more relevant now:
This would not be my expectation at all. Obviously, this is subjective but I have thus far come to see functions as simply casting in place, and the different contexts then have different output effects. The way I see this:
in javascript, is roughly: function format({ myNum: _myNum}) {
const myNum = Number(_myNum ?? 0);
const myStr = String(myNum);
switch(Boolean(myNum)) {
default:
return `${plusOne(myNum)} is not ${plusOne(myStr)}`; // format({ myNum: 15 }) -> "16 is not 151"
}
} |
I would like to push back hard on this... I consider it critical that function implementations not be able to modify anything in the context, and really to not have any observable consequences beyond their output. MessageFormat behavior should hew as closely as possible to https://www.rfc-editor.org/rfc/rfc9535.html#section-2.4 :
|
I agree wholeheartedly with what you're saying. What I meant was different: while doing formatting, functions can access the contents of their copy of the formatting context. They should have no observable impact outside of the call to "format message". But it should be possible to write functions in messages that do useful stuff:
The value of the variable If I use the above message with pseudocode like:
The value of |
Yes, the resolved value of
I'm making a much stronger assertion that function evaluation must not directly modify the formatting context at all. Instead, it can only be the MessageFormat machinery itself that associates function output with variables. |
@gibson042 Ironically, I think you and I are in violent agreement. The problem here is that our mental model of what the "formatting context" is differs. I think it is the set of resolved values visible only to formatting functions/selectors within the context of a specific message. The calling context for MessageFormat is something else entirely. And I agree that functions do not have write access to the formatting context ( |
According to the LDML45 spec, the only such value revealed to function implementations is the current locale. In the JS Intl.MessageFormat proposal spec, the localeMatcher and the expression source are also included. I think those are missing the current bidi setting, and it's conceivable for other constructor options to also get passed along, but are there use cases for any other values? |
You're overlooking the actual section on formatting context, which lists five things:
Note that the formatting context, when defined this way, doesn't really describe a data structure so much as it is suggestive of a set of APIs inside MF. |
Also reader/listener's gender.
…On Thu, Apr 25, 2024, 06:19 Addison Phillips ***@***.***> wrote:
According to the LDML45 spec
<https://unicode.org/reports/tr35/tr35-messageFormat.html#function-resolution>,
the only such value revealed to function implementations is the current
locale. In the JS Intl.MessageFormat proposal spec
<https://tc39.es/proposal-intl-messageformat/#sec-resolvefunction>, the
localeMatcher and the expression source are also included.
You're overlooking the actual section on formatting context
<https://www.unicode.org/reports/tr35/tr35-messageFormat.html#formatting-context>,
which lists five things:
- locale
- base direction
- input mapping
- function registry/registries
- optional fallback string
Note that the formatting context, when defined this way, doesn't really
describe a data structure so much as it is suggestive of a set of APIs
inside MF.
—
Reply to this email directly, view it on GitHub
<#736 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMCKBUPJAQ33DUBCE3LY7D7FDAVCNFSM6AAAAABE5N7YXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZXGE3DENZWGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I'm not, though. Note this important part of Function Resolution:
My understanding of "MUST be minimal" is that every part that is made available must be explicitly rationalised. In particular, as values from the input mapping and the function registry are made available via variables and declarations, those should never be available internally to a function implementation. Admittedly the spec language does currently leave the interpretation of "minimal" up to an implementation, so in theory anything's possible. |
* [DESIGN] Effect of selectors on placeholders Addresses #736, #747 DO NOT REVIEW YET == WORK IN PROGRESS * Flesh out background * Add a use case * Adding options and more user stories * Fix example typo * Update selection-declaration.md * Update selection-declaration.md * Update exploration/selection-declaration.md Co-authored-by: Eemeli Aro <[email protected]> * Update exploration/selection-declaration.md Co-authored-by: Tim Chevalier <[email protected]> * Address comments. * Update exploration/selection-declaration.md Co-authored-by: Tim Chevalier <[email protected]> * Update exploration/selection-declaration.md Co-authored-by: Tim Chevalier <[email protected]> * Update exploration/selection-declaration.md Co-authored-by: Tim Chevalier <[email protected]> * Update exploration/selection-declaration.md Co-authored-by: Richard Gibson <[email protected]> * Format tweak --------- Co-authored-by: Eemeli Aro <[email protected]> Co-authored-by: Tim Chevalier <[email protected]> Co-authored-by: Richard Gibson <[email protected]>
While preparing yet another presentation on MF2, I needed to write a simple example that got me thinking:
It's not necessarily obvious that the above is what we expect, when this looks like it'll work just as well:
However, that doesn't format the
$count
explicitly as a number; we just presume it does, because "count" sounds numeric.So the thought I had here is that this is pretty much exactly why & how MF1 ended up with
#
being special in plural selectors, and that the solution we're providing is much less obvious and requires writing a whole new.input
statement.Could we consider making the
.match
expressions also act as implicit declarations, and make them usable in placeholders? The somewhat obvious way to address them is by index position:Assigning values to
$0
,$1
, ... would not conflict with any input values, as numbers are invalidname-start
characters. That's by design so that we encourage at least some name for each variable; here that's effectively provided by the.match
expressions.I suspect that adding this shorthand would provide a more ergonomic solution for most
.input
use cases, and would enable the representation of many messages without any declarations, which currently would require one to avoid significant repetition.The syntax change required by this would probably look something like this:
with accompanying spec language making numeric variables resolve to the
.match
selectors in placeholders, and a data model error otherwise.The text was updated successfully, but these errors were encountered: