-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Intl.NumberFormat.prototype.formatToParts #5105
Implement Intl.NumberFormat.prototype.formatToParts #5105
Conversation
// In more concrete words, we want to assign the entire formatted string to some number of parts, | ||
// and each character should get the most specific part possible. We can accomplish this by | ||
// keeping an auxiliary array with the same size as the formatted string that, at each index, | ||
// holds the part for that character. For the following call pattern: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this explanation difficult to follow. The example that you give below, does that relate to the tree/example above? The example above has 11 entries, while below you refer to [0,9)
which only has 10.
I'm also not clear on how the array of 0
s below ends up turning into 1,000,000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it took me a while to get to this data structure so I think those are from two different nights. I can try clarify the comment a bit. The goal of the structure is basically that ICU reports back a tree of parts where each node in the tree has a type and a width, and there must be at least one node corresponding to each character in the tree (nodes can be wider than one character). Nodes can have children, and the parent-child relationship is that child node represents a more specific type for a given character range than the parent. Since ICU doesn't ever report "literal" parts of strings (like spaces or other extra characters), the root node in the tree will always be the entire width of the string with the type UnsetField. Then, for a string like "US$ 1,000", there will be two child nodes, one of type currency with width [0, 3) and one of type integer with width [4, 9). The integer node will have a child of type group and width [6, 7). So the most specific type for characters 0 to 3 is currency, 3 to 4 is unset (literal), 4 to 5 is integer, 5 to 6 is group, and 6 to 9 is integer.
I thought about this a bunch and I didn't like the idea of traversing an actual tree structure to get that information because it felt awkward to encode the "width" of nodes with specific meaning. So, I came up with the array structure where basically when we are told a part exists from position x to y, we can figure out what type used to apply to that span and update that section of the array with the new type. We skip over sections of the span [x, y) that have a type that doesn't match the start and end because that means we have already gotten a more specific part for that sub-span (for instance if we got a grouping separator before it's parent integer)
Does that make more sense? If not I can try to figure out a clearer explanation in person on Monday.
AssertOrFailFast(start < end); | ||
|
||
// the asserts above mean the cast to charcount_t is safe | ||
charcount_t ccStart = static_cast<charcount_t>(start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use charcount_t
here? Does the fields array have one entry for each character in the formatted string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. See my comment above.
{ | ||
if (JavascriptNumber::IsNan(num)) | ||
{ | ||
return JavascriptString::NewCopySz(_u("nan"), sc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewCopySz [](start = 45, length = 9)
Why copy when making these strings? I thought it was fine for a JavascriptString object to point to static data for its buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I didn't want to type out the numbers for each length to do NewWithBuffer, and regardless I planned on moving these to JavascriptLibrary's string cache before merging. This was mostly to prove that it would work.
, formatted(formatted) | ||
, formattedLength(formattedLength) | ||
, sc(scriptContext) | ||
, fields(RecyclerNewArrayLeaf(sc->GetRecycler(), UNumberFormatFields, formattedLength)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised the Linux build didn't complain at you for not having fields
marked as a Field
, since it can hold a recycler pointer. Any idea why that would be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, looks like the checker only applies that rule to classes that have ever been recycler-allocated themselves, and NumberFormatPartsBuilder is only created on the stack. Carry on.
In reply to: 186459626 [](ancestors = 186459626)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably still mark it as a Field in case this ever needs to be created in the heap in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though the pointer itself would be a Field, not the thing being pointed at, like Field(UNumberFormatFields *)
. Do I have this backwards?
In reply to: 186466305 [](ancestors = 186466305)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're correct, all the Field
specifiers on this class should contain the whole type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, you're right.
, fields(RecyclerNewArrayLeaf(sc->GetRecycler(), UNumberFormatFields, formattedLength)) | ||
{ | ||
// this will allow InsertPart to tell what fields have been initialized or not, and will | ||
// be the valid of resulting { type: "literal" } fields in ToPartsArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be the valid of resulting [](start = 15, length = 25)
what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I meant "the value of resulting...". Does that make more sense?
void InsertPart(UNumberFormatFields field, int start, int end) | ||
{ | ||
AssertOrFailFast(start >= 0); | ||
AssertOrFailFast(end > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second assertion seems adequately covered by the third one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3d90e7e
to
bc6ea26
Compare
Also, fix chakra-core#5097 while we are updating the bytecode and add tests for it.
bc6ea26
to
1d2a952
Compare
Also, somewhat randomly, fixes #5097 since I was already regenerating bytecode