avoid locale-dependent string comparisons #3

aoakley-roku · 2023-07-14T16:21:32Z

It doesn't seem appropriate to use locale dependent ordering to sort the variants and extensions for normalization. The result of normalization should be consistent.

The Unicode CLDR specficiation is not specific about how these tags are sorted, it just says that they should be in "alphabetical order". They only contain a-z (lower case) and 0-9, so for most locales alphabetical order is the same as the code point order we are now using.

This commit removes a dependency on the ECMAScript internationalization API. This API is not available everywhere, particularly on embedded devices like TVs where storage space for internationalization data is limited. This is important as bcp-47-normalize is now used by dash.js which in turn is used by many HbbTV applications.

wooorm · 2023-07-15T16:44:25Z

Hey!

The build was failing due to a dev-dependency, I fixed that, and merged with you.
But now coverage is failing, likely due to your work?

The idea looks good, but can you fix the coverage?

And, the style is off compared to the rest of the project, e.g., semicolons and early exits. Something like:

if (left.singleton > right.singleton) {
  return 1
}

if (left.singleton < right.singleton) {
  return -1
}

return 0

aoakley-roku · 2023-07-17T11:17:58Z

Thanks. I hadn't managed to get the test coverage check to run locally, but it works now with the dependency updates.

The missing bit of test coverage was for when there are multiple extensions with the same singleton. e.g. en-a-ext1-a-ext2. The specification says that this is not valid, and I think that bcp47.parse should reject these (but it doesn't at the moment).

Rather than add a test which will fail if bcp47.parse is updated to reject this case, I've removed the return 0 statement and replaced it with a comment. This doesn't change the behavior in any way, but it makes the coverage checker happy.

It doesn't seem appropriate to use locale dependent ordering to sort the variants and extensions for normalization. The result of normalization should be consistent. The Unicode CLDR specficiation is not specific about how these tags are sorted, it just says that they should be in "alphabetical order". They only contain a-z (lower case) and 0-9, so for most locales alphabetical order is the same as the code point order we are now using. This commit removes a dependency on the ECMAScript internationalization API. This API is not available everywhere, particularly on embedded devices like TVs where storage space for internationalization data is limited. This is important as bcp-47-normalize is now used by dash.js which in turn is used by many HbbTV applications. The new test case isn't strictly valid - there should not be more than one extension with the same singleton.

aoakley-roku · 2023-07-17T15:15:14Z

Argh, now it's failing some other check.

At this point I'm really not sure what the "right" thing to do is. Options I can see are:

Include the "return 0" line and add a test which isn't really valid, like the en-a-ext1-a-ext2. I think that bcp47.parse should reject this and therefore the test should fail, but at the moment the test will pass.
Not handle the equal case at, making compareSingleton always return +1 or -1. This isn't a "consistent comparator" according to ECMAScript and the sort order is therefore implementation-defined.
Rewrite compareSingleton as a one liner. This silences the coverage checker, but makes the code harder to read.

The first of these options seems the least bad to me, so I've now pushed code which does that.

wooorm · 2023-07-17T15:50:03Z

yep, 1 is fine

wooorm · 2023-07-17T15:51:00Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid locale-dependent string comparisons #3

avoid locale-dependent string comparisons #3

aoakley-roku commented Jul 14, 2023

wooorm commented Jul 15, 2023

aoakley-roku commented Jul 17, 2023

aoakley-roku commented Jul 17, 2023

wooorm commented Jul 17, 2023

wooorm commented Jul 17, 2023

wooorm commented Jul 17, 2023

avoid locale-dependent string comparisons #3

avoid locale-dependent string comparisons #3

Conversation

aoakley-roku commented Jul 14, 2023

wooorm commented Jul 15, 2023

aoakley-roku commented Jul 17, 2023

aoakley-roku commented Jul 17, 2023

wooorm commented Jul 17, 2023

wooorm commented Jul 17, 2023

wooorm commented Jul 17, 2023