Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #782: give implementations more flexibility in error handling #795

Closed
wants to merge 1 commit into from

Conversation

mihnita
Copy link
Collaborator

@mihnita mihnita commented May 16, 2024

No description provided.

Comment on lines +27 to +30
In all cases, when encountering an error during formatting,
a message formatter MUST provide some representation of the message,
or MUST provide an informative error or errors.
An implementation MAY provide both.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "in all cases" text with the "MUST or MUST" is a bit unusual in terms of standards. Perhaps:

Suggested change
In all cases, when encountering an error during formatting,
a message formatter MUST provide some representation of the message,
or MUST provide an informative error or errors.
An implementation MAY provide both.
When an error is encountered during formatting,
a message formatter can provide an informative error (or errors)
or some representation of the message or both.

Copy link
Collaborator Author

@mihnita mihnita May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. I am OK with the suggestion, and I trust you on what standards do :-)

But isn't the final form a bit too mild?
The result is "can A or B or both"
Does it leave an open door to do neither A nor B?

Would something like this be more in line with what standards do?: "MUST A or B or both"
Which would preserve (at least some) of the "strength" of the original text?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati I think maybe a better pseudo-model would look like the below. This ignores the try/catch pattern, which would look like your example, to show some of the other ways APIs might be provided:

MessageFormatter mf = MessageFormatter.withMessage(pattern);
int result = mf.format(parameters); // some APIs return a status code, e.g. ICU4C
String output = mf.toString();      // fallback message if errors
MessageFormatter.Part[] parts = mf.toParts(); // let's not forget formatToParts
if (mf.hasErrors()) {
    log.error(mf.getErrors());      // you can get the errors out too
}

@eemeli is correct that any of the three models he lists would be permitted by the proposed text. I think I agree that we should make a design doc, even if it is a small one, since I note that some programming languages (such as Java) use exception throwing as the primary error mechanism. But some do not. We need to enable those languages also.

I think it is probably incorrect to only have a fallback message with no way to find out if the formatting call failed in some way.

So I'd suggest something like:

Suggested change
In all cases, when encountering an error during formatting,
a message formatter MUST provide some representation of the message,
or MUST provide an informative error or errors.
An implementation MAY provide both.
Programming languages and runtime environments differ widely,
so this specification does not define the exact mechanism for reporting errors.
When an error or errors are encountered during formatting,
an implementation MUST provide a means for the caller to discover the errors.
For example, it might throw an exception or it might return a status code signaling an error.
In some environments, callers will expect the formatting call to return
a representation of the formatted message
in addition to any error handling or signaling mechanism.
When an error or errors are encountered during formatting,
an implementation SHOULD provide a means for the caller to obtain
some representation of the formatted message.
Such a representation MUST follow the fallback representation defined in this specification.
Implementations MAY provide both an error signal and a fallback representation.
Implementations MAY allow callers to ignore errors
(such as by disabling exception throwing or providing non-erroring APIs).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay with any of: 1) @mihnita's proposed change 2) @aphillips's proposed change so long as the change below is made 3) just changing the existing spec text's "MUST"s to "MAY"s, or 4) some combination thereof.

The change to @aphillips's proposed change above is:

- an implementation MUST provide a means for the caller to discover the errors.
+ an implementation MUST provide a means for the caller to discover the occurrence of an error.

I think that change supports the constraints of something like ICU4X in Rust where it only reports 1 error if there are many errors encountered, due to the constraint that they don't want to allocate but allocation would be required if they had to report any & all errors, according to the feedback that we asked them to give. And I think it better reflects @aphillips's original intent when he wrote above

way to find out if the formatting call failed in some way

I say this also because the hypothetical Java-ish pseudocode above also includes the line

log.error(mf.getErrors())

which implies getting multiple errors, but that's probably unnecessary/undesirable for many places in which Rust is used, and is not supported by ICU4X in particular.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an implementation SHOULD provide a means for the caller to obtain
some representation of the formatted message

This PR, plus maybe at least this bit of editing, is an improvement over the original. One key thing to note is that the most idiomatic way to implement in many languages will be more than one formatter function. For example in rust where you can return a Result<message, error>, where you only get either message or error but not both, providing "a means for the caller to obtain some representation" will look like

trait Message {
    fn format(&self, args: Arguments) -> Result<String, Error>;
    fn formatFallback(&self, args: Arguments) -> String;
}

The usage would be to try with message.format() first using normal idiomatic error handling to determine if it worked or not. In the event an error was returned instead of a formatted string, then the programmer has the option to use the infallible formatFallback variant that does whatever it needs to do to return something (even if it is just the message source or even the id) but is unable to return any error info at the same time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with a change that separated the two current MUST statements from each other to more explicitly allow an implementation like @alerque's above to be conformant.

As I see it, the two key requirements we are and should be imposing are:

  1. You can always get some formatted representation of a valid message.
  2. For a message with errors, you can get at least one of the errors.

Note in particular the use of the word "valid" there; that spec-defined qualifier is intentional, as we probably should not require messages with syntax or data model errors to still be nominally formattable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eemeli noted about valid messages. It should be pointed out that invalid messages would fail before one starts the formatting process. (Of course it is possible to visualize a method/function that parses and formats a message all in one go, but under the hood the implementation will fail the message before it gets to the formatting steps in our spec)

Note that the fallback representation of syntax and data model errors (these errors define what it means to be a "valid" message) is the logo string, specifically:

If the message being formatted has any Syntax Errors or Data Model Errors, the result of pattern selection MUST be a pattern resolving to a single fallback value using the message's fallback string defined in the formatting context or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER �.

@echeran noted:

I think that change supports the constraints of something like ICU4X in Rust where it only reports 1 error if there are many errors encountered, due to the constraint that they don't want to allocate but allocation would be required if they had to report any & all errors, according to the feedback we asked them to give. And I think it better reflects @aphillips's original intent when he wrote above...

I would probably make a change to your proposal. An implementation that doesn't want to return a structure containing all of the errors might reasonably be "fast fail" (stop trying to format the message and return the first error it encounters rather than trying to "collect them all": it's not clear that an implementation can always recover to find all of the errors anyway or that removing an error might not result in a different outcome).

- an implementation MUST provide a means for the caller to discover the occurrence of an error.
+ an implementation MUST provide a means for the caller to discover than an error or errors have occurred.
+ Implementations are not required to enumerate all possible errors.

@alerque I agree that separate functions are one possibility (another example that somewhere I call out is ICU4J sometimes has "throwing" and "non-throwing" format methods).

Note that fn format($self, args: Arguments) -> Result<String,Error>; could return both String and Error for every call, with Error::NONE being the "good result", saving the expense of formatting twice. When there is an error, the string can be the fallback string. I don't think we should require this behavior. Just calling it out.


As a user, I'd be unamused by the need to write tons of logic around my format calls. One of the nice things about message format is the ability to get rid of tons of fiddly code related to the arguments and their formatting. I tend to want my calls to MF2 to be as sleek as possible. In most cases I'm going to try to solve all of my errors in testing, so I can be "free to fly" in production.

Copy link
Contributor

@alerque alerque May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that fn format($self, args: Arguments) -> Result<String,Error>; could return both String and Error for every call, with Error::NONE being the "good result"

No it can't, that's impossible. Rust results are either a successful something (type specified) or an error, but never both.

The only way around that is to re-roll your own error handling from scratch by making your own types (with a tuple struct or whatever) with results and errors inside them—and eschewing all of Rust's error handling patterns. And that's crazy. This is the whole reason this issue came up because the way the spec is written would force people to write non-ergonomic and very much non-idiomatic code in many languages just to be spec compliant.

As a user, I'd be unamused by the need to write tons of logic around my format calls.

That's exactly what an end user would have to do in many languages if the API followed the patterns the spec is currently trying to mandate and/or keep getting suggested here as possible patterns. Maybe in some languages that is an ergonomic/idiomatic possibility, but not Rust and not many other languages I know of. The message format shouldn't be mandating this aspect of an implementation API. It can make suggestions so that robust usage is possible, but shouldn't try to foist a particular error handling pattern onto languages that may or may not actually work that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque Never mind then. I have no desire to create special work for Rust implementers (the whole point of my suggestions is to allow idiomatic implementation everywhere)

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to need a design doc to explain the change.

At the moment, the spec is providing exactly one choice for what an MF2 formatter is expected to do when a runtime error occurs: Provide some representation of the message, and separately report the error or errors.

The proposed change would be to remove this requirement completely, and allow any of the following to occur:

  1. A message with fallback contents is provided, and errors are swallowed.
  2. A message with fallback contents is provided, and errors are also emitted.
  3. No message is provided, and an error is emitted.

I don't think that's a good idea, and I think this is something on which it's appropriate for us to be opinionated.

@macchiati
Copy link
Member

macchiati commented May 16, 2024 via email

@aphillips aphillips added normative errors Issues related to the errors section of the spec LDML46 LDML46 Release (Tech Preview - October 2024) labels May 17, 2024
@aphillips
Copy link
Member

closed by #816 in 2024-08-26

@aphillips aphillips closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
errors Issues related to the errors section of the spec LDML46 LDML46 Release (Tech Preview - October 2024) normative
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants