Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle renamed writing systems #230

Open
rmunn opened this issue May 9, 2022 · 1 comment
Open

Handle renamed writing systems #230

rmunn opened this issue May 9, 2022 · 1 comment

Comments

@rmunn
Copy link
Collaborator

rmunn commented May 9, 2022

In the LCM data model, TsStrings can have multiple runs. Each run can be tagged with styles like bold or italic, or with language tags so that you can say "This part in the middle of the English sentence is actually Greek so it should be displayed with a Greek font". The way this is represented in LfMerge is by converting any runs that contain other writing systems to <span ws="xyz">text in the xyz language</span>. But what we didn't account for is when the writing system tag changes in FieldWorks.

For example, if we have <span ws="en">foo</span> in LfMerge, but the writing system tag changes to en-Latn-US in FieldWorks, the current code will look up the "en" writing system, find that it's not present (so the lookup returns the value 0), and try to insert text with a writing system of 0 into the FieldWorks TsString. Which is an invalid writing system ID, so FieldWorks throws an error saying "The specified writing system code is invalid" from the SIL.LCModel.Core.Text.TsPropsBldr.SetIntPropValues method.

What needs to happen is that LfMerge's SpanStrToTsString code needs to handle the case where GetWsFromStr returns 0, and try multiple different valid versions of the writing system. E.g. if en is not found, try en-Latn and then en-Latn-US (and also en-US), using the current data from langtags.json to figure out the correct default region(s) and script(s) to try. Then if none of those produce valid results, the SpanStrToTsString code should "punt" and go with the project's main writing system, which will more often than not be correct anyway. (Most of the time when this happens, it's because the data is in a Notes field that looks something like this: "An alternate spelling is <span ws="xyz">blahblah</span>, but this is rarely encountered except in books from a century ago").

@megahirt
Copy link
Contributor

We need a clear set of scenario instructions to follow in order to reproduce this scenario, then we can begin to design this.

I'm guessing the repro steps go something like:

  • start with an existing FLEx project in LD
  • create a string in FLEx data that is a "writing system string" which defines the WS inside the string. (this step needs more explanation)
  • clone project into LF
  • change writing system in FLEx to be a different code (modify it with an x- or add script/region/variant)
  • do a FLEx S/R
  • do a LF S/R
  • observe a broken LF or broken LF S/R (this is an assumption - I'm not sure what the expected error is)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants