Implement Intl.NumberFormat.prototype.formatToParts #5105

jackhorton · 2018-05-05T19:07:34Z

Also, somewhat randomly, fixes #5097 since I was already regenerating bytecode

MSLaguana · 2018-05-05T21:50:03Z

lib/Runtime/Library/IntlEngineInterfaceExtensionObject.cpp

+    // In more concrete words, we want to assign the entire formatted string to some number of parts,
+    // and each character should get the most specific part possible. We can accomplish this by
+    // keeping an auxiliary array with the same size as the formatted string that, at each index,
+    // holds the part for that character. For the following call pattern:


I find this explanation difficult to follow. The example that you give below, does that relate to the tree/example above? The example above has 11 entries, while below you refer to [0,9) which only has 10.

I'm also not clear on how the array of 0s below ends up turning into 1,000,000

Yeah it took me a while to get to this data structure so I think those are from two different nights. I can try clarify the comment a bit. The goal of the structure is basically that ICU reports back a tree of parts where each node in the tree has a type and a width, and there must be at least one node corresponding to each character in the tree (nodes can be wider than one character). Nodes can have children, and the parent-child relationship is that child node represents a more specific type for a given character range than the parent. Since ICU doesn't ever report "literal" parts of strings (like spaces or other extra characters), the root node in the tree will always be the entire width of the string with the type UnsetField. Then, for a string like "US$ 1,000", there will be two child nodes, one of type currency with width [0, 3) and one of type integer with width [4, 9). The integer node will have a child of type group and width [6, 7). So the most specific type for characters 0 to 3 is currency, 3 to 4 is unset (literal), 4 to 5 is integer, 5 to 6 is group, and 6 to 9 is integer.

I thought about this a bunch and I didn't like the idea of traversing an actual tree structure to get that information because it felt awkward to encode the "width" of nodes with specific meaning. So, I came up with the array structure where basically when we are told a part exists from position x to y, we can figure out what type used to apply to that span and update that section of the array with the new type. We skip over sections of the span [x, y) that have a type that doesn't match the start and end because that means we have already gotten a more specific part for that sub-span (for instance if we got a grouping separator before it's parent integer)

Does that make more sense? If not I can try to figure out a clearer explanation in person on Monday.

MSLaguana · 2018-05-05T21:52:46Z

lib/Runtime/Library/IntlEngineInterfaceExtensionObject.cpp

+            AssertOrFailFast(start < end);
+
+            // the asserts above mean the cast to charcount_t is safe
+            charcount_t ccStart = static_cast<charcount_t>(start);


Why use charcount_t here? Does the fields array have one entry for each character in the formatted string?

Correct. See my comment above.

sethbrenith · 2018-05-07T15:20:52Z

lib/Runtime/Library/IntlEngineInterfaceExtensionObject.cpp

+            {
+                if (JavascriptNumber::IsNan(num))
+                {
+                    return JavascriptString::NewCopySz(_u("nan"), sc);


NewCopySz [](start = 45, length = 9)

Why copy when making these strings? I thought it was fine for a JavascriptString object to point to static data for its buffer.

Honestly, I didn't want to type out the numbers for each length to do NewWithBuffer, and regardless I planned on moving these to JavascriptLibrary's string cache before merging. This was mostly to prove that it would work.

sethbrenith · 2018-05-07T15:31:23Z

lib/Runtime/Library/IntlEngineInterfaceExtensionObject.cpp

+            , formatted(formatted)
+            , formattedLength(formattedLength)
+            , sc(scriptContext)
+            , fields(RecyclerNewArrayLeaf(sc->GetRecycler(), UNumberFormatFields, formattedLength))


I'm surprised the Linux build didn't complain at you for not having fields marked as a Field, since it can hold a recycler pointer. Any idea why that would be?

Oh, looks like the checker only applies that rule to classes that have ever been recycler-allocated themselves, and NumberFormatPartsBuilder is only created on the stack. Carry on.

In reply to: 186459626 [](ancestors = 186459626)

I should probably still mark it as a Field in case this ever needs to be created in the heap in the future.

I though the pointer itself would be a Field, not the thing being pointed at, like Field(UNumberFormatFields *). Do I have this backwards?

In reply to: 186466305 [](ancestors = 186466305)

I believe you're correct, all the Field specifiers on this class should contain the whole type

Nope, you're right.

sethbrenith · 2018-05-07T15:44:30Z

lib/Runtime/Library/IntlEngineInterfaceExtensionObject.cpp

+            , fields(RecyclerNewArrayLeaf(sc->GetRecycler(), UNumberFormatFields, formattedLength))
+        {
+            // this will allow InsertPart to tell what fields have been initialized or not, and will
+            // be the valid of resulting { type: "literal" } fields in ToPartsArray


be the valid of resulting [](start = 15, length = 25)

what?

I think I meant "the value of resulting...". Does that make more sense?

sethbrenith · 2018-05-07T15:47:00Z

lib/Runtime/Library/IntlEngineInterfaceExtensionObject.cpp

+        void InsertPart(UNumberFormatFields field, int start, int end)
+        {
+            AssertOrFailFast(start >= 0);
+            AssertOrFailFast(end > 0);


This second assertion seems adequately covered by the third one

sethbrenith

Also, fix chakra-core#5097 while we are updating the bytecode and add tests for it.

…tToParts Merge pull request #5105 from jackhorton:intl/nf-formattoparts Also, somewhat randomly, fixes #5097 since I was already regenerating bytecode

jackhorton added the Intl-ICU label May 5, 2018

jackhorton requested review from dilijev, sethbrenith, MSLaguana and jefgen May 5, 2018 19:07

MSLaguana reviewed May 5, 2018

View reviewed changes

sethbrenith reviewed May 7, 2018

View reviewed changes

sethbrenith approved these changes May 7, 2018

View reviewed changes

jackhorton force-pushed the intl/nf-formattoparts branch 2 times, most recently from 3d90e7e to bc6ea26 Compare May 7, 2018 20:39

jackhorton added 2 commits May 8, 2018 16:21

Implement Intl.NumberFormat.prototype.formatToParts

fc14ec2

Also, fix chakra-core#5097 while we are updating the bytecode and add tests for it.

Update bytecode

1d2a952

jackhorton force-pushed the intl/nf-formattoparts branch from bc6ea26 to 1d2a952 Compare May 8, 2018 23:44

chakrabot merged commit 1d2a952 into chakra-core:master May 9, 2018

jackhorton deleted the intl/nf-formattoparts branch May 9, 2018 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Intl.NumberFormat.prototype.formatToParts #5105

Implement Intl.NumberFormat.prototype.formatToParts #5105

jackhorton commented May 5, 2018 •

edited

Loading

MSLaguana May 5, 2018

jackhorton May 6, 2018 •

edited

Loading

MSLaguana May 5, 2018

jackhorton May 6, 2018

sethbrenith May 7, 2018

jackhorton May 7, 2018

sethbrenith May 7, 2018

sethbrenith May 7, 2018

jackhorton May 7, 2018

sethbrenith May 7, 2018

MSLaguana May 7, 2018

jackhorton May 7, 2018

sethbrenith May 7, 2018

jackhorton May 7, 2018

sethbrenith May 7, 2018

sethbrenith left a comment

Implement Intl.NumberFormat.prototype.formatToParts #5105

Implement Intl.NumberFormat.prototype.formatToParts #5105

Conversation

jackhorton commented May 5, 2018 • edited Loading

Choose a reason for hiding this comment

jackhorton May 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethbrenith left a comment

Choose a reason for hiding this comment

jackhorton commented May 5, 2018 •

edited

Loading

jackhorton May 6, 2018 •

edited

Loading