-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FCP HIPE: localized messages #64
Conversation
Signed-off-by: Daniel Hardman <[email protected]>
First cut is I really like this. This overlaps (overlays?) with the schemas and overlays work, as a key use case of that is localization specifically as it relates to schema. This is more general, but still complementary. With VON, we've already experienced the same challenge. Our first cut had been the traditional approach you mention - the UI software presenting the data does the localization, which does not scale to our many Issuers. The Issuers to be able to convey to the Holder the localization data about what is being issued. The only concern I have with this is whether there is a need to focus on the trustworthiness of the catalog. I think the catalog is necessary. Is there a concern that the translations mechanism can be used to confuse users - e.g. translate "Yes" to "No" and vice-versa for an important transaction. Or is my tinfoil hat on too tight? |
text/localized-messages/README.md
Outdated
in a message; wherever it appears, it overrides any message catalog specified at a more general | ||
level. The value of `@msg_catalog` is a URI (ideally, a DID reference): | ||
|
||
[![sample5.png](sample5.png)](sample5.json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can/should the @msg_catalog be scoped to the friendly_ltxt context to support different catalogs for multiple *_ltxt contexts within a single message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW - not sure I'm a fan of the images for JSON approach. Looks good, but it's hard to comment on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of using the inline markdown formatting for code examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TelegramSam when you say "inline markdown formatting", are you talking about just doing codeblocks with a triple backtick and a declaration of the code type--or something fancier? If fancier, I want to learn how. I tried the backticks and was very unhappy with it. I wanted syntax highlighting, and I also wanted to be able to bold or select a subset of the JSON to call it out.
@swcurran I checked in a .json file for each of the JSON graphics--so you can leave comments on the lines of the json file instead of the image.
I don't love this solution. If there's something better, please tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use tripple ticks with the language right after. I use json as the language. Different markdown parsers highlight differently; some do useful things, some do nothing.
Ideas:
{
//... normal message stuff
"@locale":"en",
"@msg_catalog": "<catalog uri>",
"some_string_attribute":"This is a test",
"some_string_attribute_loc": {
"code": "this_is_a_test",
"es": "<translation of 'this is a test'>"
}
} Example Notes: |
Good improvement. I'll update.
I thought about this. I know it's sometimes done, but there are two drawbacks: you can't change the value of a string without invalidating its lookup in a catalog, and you have ambiguities where the same text means two different things (e.g., "control" as a verb in one place, and as a noun is another). I don't know if either of these are a big deal, but that's why I said the code was required. Maybe we make the code optional, so either lookup key could be used?
These are good points. However, if we do this, then it is no longer possible to look at a field and realize by its naming convention that it is localizable. You only know it is localizable if it has a sibling attribute. And since I expect most people who write localizable messages will make no effort to localize them (e.g., they values of the attributes will be generated dynamically), we'd end either with no clues of which attributes to localize, or with a lot of stuff that looks like this: "some_string_attribute":"This is a test",
"some_string_attribute_loc": {} I am also dubious about the utility of making something localized when it wasn't to start out. My experience has been that if you weren't thinking about localization from the beginning, you usually have bigger problems than schema adjustments. I don't know what to do about this, though--because even though I feel pretty strongly about my own reasoning, the other points raised by @TelegramSam are equally good. Is there some way we can have our cake and eat it too? |
This is a really important point, @swcurran . I'll add some text about it. Can you think of any ways to strengthen the security around it? An obvious way would be to publish the hash of the catalog so it couldn't be tampered with--but that feels like tedious, kludgey overkill... Yet the risk of hacking via the catalog is real... |
What if we said that messages purporting to belong to a schema with required attribute "some_string_attribute", but lacking that field, could still be valid if a field named "some_string_attribute_ltxt" is present. If it is, then the latter attribute is the localized variant of "some_string_attribute" and should be interpreted as satisfying that field's place in the schema. In this way, messages could gain localization support without doing violence to a schema. I dunno. I don't love it. Requires a parser/validator to do something quirky. |
Signed-off-by: Daniel Hardman <[email protected]>
@swcurran and @TelegramSam : I updated the HIPE to address all your comments. There's a security warning and best practices around the catalog hacking issue. The _ltxt field now has a sibling field instead of being a dict. (I used @swcurran Can you make the corresponding changes in the problem_report HIPE, such that |
Signed-off-by: Daniel Hardman <[email protected]>
I think there are a couple of pieces of thoughts related to prior art to look at here. In the open source world, "catalogs" (as we are calling them here) evolve over time through community contributions. For example, many open source applications have releases that consist only of new translations done by community contributors. I think in this case, we have to expect the same model and we should design a system to support that. For example, a decentralized way to extend and rate (approve?) of a translation. The Schemas and Overlays group are planning to have Schema Overlays (metadata associated with a Schema) that (in some cases) are for localization on the ledger. I don't know how far along that plan is to reality - and whether the indy-node team is comfortable with that. While the catalogs we are discussing are tied to messages vs. schema - would it be worthwhile to have them on the ledger - or at least immutable? Then the message receiver could be notified of what immutable message overlays are available and which ones should be used. I think it's doable, but it's complicated and adds a bunch of state for agents to track... AFIAIK with say, python applications, given a catalog (in the case of an app,, that's in the codebase), all strings to be presented are first checked for a mapping to the localized string in the requested locale. With Sam's proposal of the catalog and locale in the message, and a user providing the desired locale to present, that should be pretty easy. Further, with the catalog and |
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
7dbd28d
to
801db6c
Compare
@swcurran: regarding the relation to schemas and overlays, and publishing on the ledger, I would say that that's a very interesting idea, but I don't want to hold up this HIPE for it. Let's keep a bookmark in it and circle back to it when the progress there makes the link to their work easier. (Note as well that W3C just decided to change the name of the "identifier registry" in the VC spec to "verifiable data registry", exactly to accommodate stuff like this, where you must have a provably correct version of something. We don't necessarily need to publish on the ledger; we could publish anywhere if the hash of the message catalog were included in the message. But we can work that out later.) |
I love the _l10n suffix for a sibling decorator. I wonder if the locale and catalog would be better organized under a structure like this: "@l10n": {
"locale": "en",
"catalog": "<did ref>"
} This feels simpler, and the @l10n matches the _l10n. This would be the first block form @annotation, which makes me a little nervous, but feels better than both a @Locale and a @msg_catalog. My remaining issue is knowing which fields can be localized. It is fairly important that we articulate why not just any string should be localized because of the security risk of accidentally sending a secure secret to a translation service.
We certainly shouldn't allow all of these options due to the resulting complexity. It occurs to me that there is a progression of localization maturity:
The inline localizations fit in with 2 and 3 as an alternative or addition to a catalog. I'm going to guess that many families will be prototyped and tested at level 0 or 1. If breaking changes are required to progress up the levels of localization it would cause a Major version update to the message family, which may or may not be a desirable quality. On catalog attack prevention: Can we sign the catalog with the key in the DID doc used to reference the catalog? Just an idea, we should handle this in a future HIPE. |
Agreed - that's the right approach. |
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
@dhh1128 Your work on this HIPE is incredible. Thank you for your effort. The only changes I can see that are needed is the cleanup of the last section and filling out the complex example. |
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
Signed-off-by: Daniel Hardman <[email protected]>
This is superseded by hyperledger/aries-rfcs#43. |
Signed-off-by: Daniel Hardman [email protected]