FCP HIPE: localized messages #64

dhh1128 · 2018-12-01T05:19:32Z

Signed-off-by: Daniel Hardman [email protected]

Signed-off-by: Daniel Hardman <[email protected]>

swcurran · 2018-12-01T07:41:40Z

First cut is I really like this. This overlaps (overlays?) with the schemas and overlays work, as a key use case of that is localization specifically as it relates to schema. This is more general, but still complementary. With VON, we've already experienced the same challenge. Our first cut had been the traditional approach you mention - the UI software presenting the data does the localization, which does not scale to our many Issuers. The Issuers to be able to convey to the Holder the localization data about what is being issued.

The only concern I have with this is whether there is a need to focus on the trustworthiness of the catalog. I think the catalog is necessary. Is there a concern that the translations mechanism can be used to confuse users - e.g. translate "Yes" to "No" and vice-versa for an important transaction. Or is my tinfoil hat on too tight?

swcurran · 2018-12-01T07:45:06Z

text/localized-messages/README.md

+in a message; wherever it appears, it overrides any message catalog specified at a more general
+level. The value of `@msg_catalog` is a URI (ideally, a DID reference):
+
+[![sample5.png](sample5.png)](sample5.json)


Can/should the @msg_catalog be scoped to the friendly_ltxt context to support different catalogs for multiple *_ltxt contexts within a single message?

BTW - not sure I'm a fan of the images for JSON approach. Looks good, but it's hard to comment on.

I'm a fan of using the inline markdown formatting for code examples.

@TelegramSam when you say "inline markdown formatting", are you talking about just doing codeblocks with a triple backtick and a declaration of the code type--or something fancier? If fancier, I want to learn how. I tried the backticks and was very unhappy with it. I wanted syntax highlighting, and I also wanted to be able to bold or select a subset of the JSON to call it out.

@swcurran I checked in a .json file for each of the JSON graphics--so you can leave comments on the lines of the json file instead of the image.

I don't love this solution. If there's something better, please tell.

I use tripple ticks with the language right after. I use json as the language. Different markdown parsers highlight differently; some do useful things, some do nothing.

text/localized-messages/README.md

TelegramSam · 2018-12-03T16:58:32Z

Ideas:

Declare the locale at the message level. Then all strings can be assumed to be that locale unless otherwise specified. At a minimum, this allows machine translation in the absence of more localization detail.
Use the attribute string itself as a reference to the string in the msg_catalog. This works well with short strings.
Include localization detail with sibling attributes, not manipulation of the main attribute.
Example:

{
  //... normal message stuff
  "@locale":"en",
  "@msg_catalog": "<catalog uri>",
  "some_string_attribute":"This is a test",
  "some_string_attribute_loc": {
    "code": "this_is_a_test",
    "es": "<translation of 'this is a test'>"
  }
}

Example Notes:
Even without the .._loc attribute, you can look up "This is a test" in the catalog.
The sibling ..._loc attribute makes it easier for schema parsers. They will ignore it if they are not expecting it.
These changes allows the localization to be 'additive' in that you can add localization without modifying the existing attribute structures. This removes barriers to making something localized when it wasn't to start out.

dhh1128 · 2018-12-07T03:31:06Z

Declare the locale at the message level.

Good improvement. I'll update.

Use the attribute string itself as a reference to the string in the msg_catalog. This works well with short strings.

I thought about this. I know it's sometimes done, but there are two drawbacks: you can't change the value of a string without invalidating its lookup in a catalog, and you have ambiguities where the same text means two different things (e.g., "control" as a verb in one place, and as a noun is another). I don't know if either of these are a big deal, but that's why I said the code was required. Maybe we make the code optional, so either lookup key could be used?

Include localization detail with sibling attributes, not manipulation of the main attribute.
The sibling ..._loc attribute makes it easier for schema parsers. They will ignore it if they are not expecting it.
These changes allows the localization to be 'additive' in that you can add localization without modifying the existing attribute structures. This removes barriers to making something localized when it wasn't to start out.

These are good points. However, if we do this, then it is no longer possible to look at a field and realize by its naming convention that it is localizable. You only know it is localizable if it has a sibling attribute. And since I expect most people who write localizable messages will make no effort to localize them (e.g., they values of the attributes will be generated dynamically), we'd end either with no clues of which attributes to localize, or with a lot of stuff that looks like this:

"some_string_attribute":"This is a test",
"some_string_attribute_loc": {}

I am also dubious about the utility of making something localized when it wasn't to start out. My experience has been that if you weren't thinking about localization from the beginning, you usually have bigger problems than schema adjustments.

I don't know what to do about this, though--because even though I feel pretty strongly about my own reasoning, the other points raised by @TelegramSam are equally good. Is there some way we can have our cake and eat it too?

dhh1128 · 2018-12-07T03:34:26Z

focus on the trustworthiness of the catalog

This is a really important point, @swcurran . I'll add some text about it. Can you think of any ways to strengthen the security around it? An obvious way would be to publish the hash of the catalog so it couldn't be tampered with--but that feels like tedious, kludgey overkill... Yet the risk of hacking via the catalog is real...

dhh1128 · 2018-12-07T03:44:11Z

@TelegramSam:

Is there some way we can have our cake and eat it too?

What if we said that messages purporting to belong to a schema with required attribute "some_string_attribute", but lacking that field, could still be valid if a field named "some_string_attribute_ltxt" is present. If it is, then the latter attribute is the localized variant of "some_string_attribute" and should be interpreted as satisfying that field's place in the schema. In this way, messages could gain localization support without doing violence to a schema.

I dunno. I don't love it. Requires a parser/validator to do something quirky.

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 · 2018-12-07T05:19:01Z

@swcurran and @TelegramSam : I updated the HIPE to address all your comments. There's a security warning and best practices around the catalog hacking issue. The _ltxt field now has a sibling field instead of being a dict. (I used _l10n instead of _loc for the suffix, because _loc is likely to be used as a short form of _location in many schemas.) The HIPE now describes how to do a lookup on a string value if no code is given. And I have also included a note about how schemas that declare a field without the _ltxt suffix can upgrade to localization support--not automatically, but by noting this feature in their message family definition docs.

@swcurran Can you make the corresponding changes in the problem_report HIPE, such that friendly_ltxt (which I've noted in a separate comment should be renamed to explain_ltxt has a simple string value and sibling field explain_l10n, instead of having a value that's a dict?

Signed-off-by: Daniel Hardman <[email protected]>

swcurran · 2018-12-07T15:16:13Z

focus on the trustworthiness of the catalog

This is a really important point, @swcurran . I'll add some text about it. Can you think of any ways to strengthen the security around it? An obvious way would be to publish the hash of the catalog so it couldn't be tampered with--but that feels like tedious, kludgey overkill... Yet the risk of hacking via the catalog is real...

I think there are a couple of pieces of thoughts related to prior art to look at here.

In the open source world, "catalogs" (as we are calling them here) evolve over time through community contributions. For example, many open source applications have releases that consist only of new translations done by community contributors. I think in this case, we have to expect the same model and we should design a system to support that. For example, a decentralized way to extend and rate (approve?) of a translation.

The Schemas and Overlays group are planning to have Schema Overlays (metadata associated with a Schema) that (in some cases) are for localization on the ledger. I don't know how far along that plan is to reality - and whether the indy-node team is comfortable with that. While the catalogs we are discussing are tied to messages vs. schema - would it be worthwhile to have them on the ledger - or at least immutable? Then the message receiver could be notified of what immutable message overlays are available and which ones should be used. I think it's doable, but it's complicated and adds a bunch of state for agents to track...

AFIAIK with say, python applications, given a catalog (in the case of an app,, that's in the codebase), all strings to be presented are first checked for a mapping to the localized string in the requested locale. With Sam's proposal of the catalog and locale in the message, and a user providing the desired locale to present, that should be pretty easy. Further, with the catalog and code field that should be easy as well - the code field would be used as the neutral form of the text.

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 · 2018-12-11T00:19:45Z

@swcurran: regarding the relation to schemas and overlays, and publishing on the ledger, I would say that that's a very interesting idea, but I don't want to hold up this HIPE for it. Let's keep a bookmark in it and circle back to it when the progress there makes the link to their work easier. (Note as well that W3C just decided to change the name of the "identifier registry" in the VC spec to "verifiable data registry", exactly to accommodate stuff like this, where you must have a provably correct version of something. We don't necessarily need to publish on the ledger; we could publish anywhere if the hash of the message catalog were included in the message. But we can work that out later.)

TelegramSam · 2018-12-11T20:39:44Z

I love the _l10n suffix for a sibling decorator.

I wonder if the locale and catalog would be better organized under a structure like this:

"@l10n": {
"locale": "en",
"catalog": "<did ref>"
}

This feels simpler, and the @l10n matches the _l10n. This would be the first block form @annotation, which makes me a little nervous, but feels better than both a @Locale and a @msg_catalog.

My remaining issue is knowing which fields can be localized. It is fairly important that we articulate why not just any string should be localized because of the security risk of accidentally sending a secure secret to a translation service.
Already articulated are a few options:

Message Family Documentation: This one seems bad at the onset, but I don't think it's harmful at all. There are NO expectations of an unknown message family being processed. During development, the list of fields would need to be provided to the Message class to allow automatic localization. Now this is done by hand, but in the future could be automated via formalized message family docs.
Field Suffix: This is Daniel's favorite, but not mine. I feel like the cases where the field suffix would be used without a sibling field would be fairly rare in the advanced case, and the simple case likely isn't going to declare the suffix at all. This also requires a breaking change to add localization if the message family wasn't designed that way to begin with.
Sibling Field: I think this is a better explicit option, and will likely be the most common in advanced uses due to the need for a code.
In Message Field List: If we use the @l10n block I suggested, we could include a list of fields in there.

We certainly shouldn't allow all of these options due to the resulting complexity.

It occurs to me that there is a progression of localization maturity:

None at all.
Stated locale. Allows machine translation. Assumes discovery (or develop time encoding) of localizable fields.
Stated locale and catalog. Lookup via full field values.
Stated locale and catalog, fields have explicit catalog codes in sibling fields.

The inline localizations fit in with 2 and 3 as an alternative or addition to a catalog.

I'm going to guess that many families will be prototyped and tested at level 0 or 1. If breaking changes are required to progress up the levels of localization it would cause a Major version update to the message family, which may or may not be a desirable quality.

On catalog attack prevention: Can we sign the catalog with the key in the DID doc used to reference the catalog? Just an idea, we should handle this in a future HIPE.

swcurran · 2018-12-16T19:41:06Z

@swcurran: regarding the relation to schemas and overlays, and publishing on the ledger, I would say that that's a very interesting idea, but I don't want to hold up this HIPE for it. Let's keep a bookmark in it and circle back to it when the progress there makes the link to their work easier. (Note as well that W3C just decided to change the name of the "identifier registry" in the VC spec to "verifiable data registry", exactly to accommodate stuff like this, where you must have a provably correct version of something. We don't necessarily need to publish on the ledger; we could publish anywhere if the hash of the message catalog were included in the message. But we can work that out later.)

Agreed - that's the right approach.

Signed-off-by: Daniel Hardman <[email protected]>

TelegramSam · 2018-12-18T17:46:31Z

@dhh1128 Your work on this HIPE is incredible. Thank you for your effort.

The only changes I can see that are needed is the cleanup of the last section and filling out the complex example.

Signed-off-by: Daniel Hardman <[email protected]>

text/localized-messages/README.md

text/localized-messages/localizable-in-message.json

text/localized-messages/localization-section.json

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 · 2019-05-28T21:34:37Z

This is superseded by hyperledger/aries-rfcs#43.

Initial proposal

f67741a

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 changed the title ~~Initial proposal~~ propose HIPE: localized messages Dec 1, 2018

swcurran reviewed Dec 1, 2018

View reviewed changes

TelegramSam reviewed Dec 3, 2018

View reviewed changes

text/localized-messages/README.md Outdated Show resolved Hide resolved

Improvements from Sam and Stephen

45f35a7

Signed-off-by: Daniel Hardman <[email protected]>

Cleanup image hyperlinks ('json' in hovertext)

39d33d4

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 added 2 commits December 10, 2018 16:53

Cosmetic fixes

6423730

Signed-off-by: Daniel Hardman <[email protected]>

another tiny fix

801db6c

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 force-pushed the localized-messages branch from 7dbd28d to 801db6c Compare December 10, 2018 23:54

dhh1128 added 5 commits December 17, 2018 04:57

Various improvements

0fe644c

Signed-off-by: Daniel Hardman <[email protected]>

Major revision to narrative

00ddfe0

Signed-off-by: Daniel Hardman <[email protected]>

Tweak images

bb69b6f

Signed-off-by: Daniel Hardman <[email protected]>

catalog section

95c6261

Signed-off-by: Daniel Hardman <[email protected]>

Minor tweaks to verbiage

318f265

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 added 5 commits March 18, 2019 14:35

@l10n --> ~l10n

91a122f

Signed-off-by: Daniel Hardman <[email protected]>

fix decorator syntax

a841e47

Signed-off-by: Daniel Hardman <[email protected]>

tweak typo

4b02c3b

Signed-off-by: Daniel Hardman <[email protected]>

Add text about localized keys

6d70dca

Signed-off-by: Daniel Hardman <[email protected]>

Fill out advanced section

445d963

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 added 4 commits March 18, 2019 18:08

cleanup unused files

c0980ca

Signed-off-by: Daniel Hardman <[email protected]>

add details field

d8e5252

Signed-off-by: Daniel Hardman <[email protected]>

Update verbiage about message scope

ee46b85

Signed-off-by: Daniel Hardman <[email protected]>

Tweak ~l10n on message type

b4c3f07

Signed-off-by: Daniel Hardman <[email protected]>

TelegramSam reviewed Mar 22, 2019

View reviewed changes

text/localized-messages/README.md Show resolved Hide resolved

TelegramSam reviewed Mar 22, 2019

View reviewed changes

text/localized-messages/localizable-in-message.json Outdated Show resolved Hide resolved

TelegramSam reviewed Mar 22, 2019

View reviewed changes

text/localized-messages/localization-section.json Show resolved Hide resolved

dhh1128 added 2 commits March 22, 2019 18:27

Explain advanced use case more clearly

7a340f7

Signed-off-by: Daniel Hardman <[email protected]>

Fix mistake in localizable example

569357c

Signed-off-by: Daniel Hardman <[email protected]>

TelegramSam changed the title ~~propose HIPE: localized messages~~ FCP HIPE: localized messages Apr 3, 2019

Supersede with Aries RFC 0033

0192e55

Signed-off-by: Daniel Hardman <[email protected]>

dhh1128 closed this May 28, 2019

dhh1128 deleted the localized-messages branch May 31, 2019 23:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FCP HIPE: localized messages #64

FCP HIPE: localized messages #64

dhh1128 commented Dec 1, 2018

swcurran commented Dec 1, 2018

swcurran Dec 1, 2018

swcurran Dec 1, 2018

TelegramSam Dec 3, 2018

dhh1128 Dec 7, 2018

TelegramSam Dec 11, 2018

TelegramSam commented Dec 3, 2018

dhh1128 commented Dec 7, 2018 •

edited

Loading

dhh1128 commented Dec 7, 2018

dhh1128 commented Dec 7, 2018

dhh1128 commented Dec 7, 2018

swcurran commented Dec 7, 2018

dhh1128 commented Dec 11, 2018

TelegramSam commented Dec 11, 2018 •

edited

Loading

swcurran commented Dec 16, 2018

TelegramSam commented Dec 18, 2018

dhh1128 commented May 28, 2019 •

edited

Loading

FCP HIPE: localized messages #64

FCP HIPE: localized messages #64

Conversation

dhh1128 commented Dec 1, 2018

swcurran commented Dec 1, 2018

swcurran Dec 1, 2018

Choose a reason for hiding this comment

swcurran Dec 1, 2018

Choose a reason for hiding this comment

TelegramSam Dec 3, 2018

Choose a reason for hiding this comment

dhh1128 Dec 7, 2018

Choose a reason for hiding this comment

TelegramSam Dec 11, 2018

Choose a reason for hiding this comment

TelegramSam commented Dec 3, 2018

dhh1128 commented Dec 7, 2018 • edited Loading

dhh1128 commented Dec 7, 2018

dhh1128 commented Dec 7, 2018

dhh1128 commented Dec 7, 2018

swcurran commented Dec 7, 2018

dhh1128 commented Dec 11, 2018

TelegramSam commented Dec 11, 2018 • edited Loading

swcurran commented Dec 16, 2018

TelegramSam commented Dec 18, 2018

dhh1128 commented May 28, 2019 • edited Loading

dhh1128 commented Dec 7, 2018 •

edited

Loading

TelegramSam commented Dec 11, 2018 •

edited

Loading

dhh1128 commented May 28, 2019 •

edited

Loading