Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use utf conversions from minipal #89036

Merged
merged 9 commits into from
Jul 21, 2023
Merged

Conversation

LakshanF
Copy link
Member

Leverage the UTF conversion functionality needed for EventPipe from minipal.

@LakshanF LakshanF added this to the 8.0.0 milestone Jul 17, 2023
@LakshanF LakshanF self-assigned this Jul 17, 2023
@ghost
Copy link

ghost commented Jul 17, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Leverage the UTF conversion functionality needed for EventPipe from minipal.

Author: LakshanF
Assignees: LakshanF
Labels:

area-NativeAOT-coreclr

Milestone: 8.0.0

src/native/minipal/utf8.c Outdated Show resolved Hide resolved
Copy link
Member

@am11 am11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

src/native/minipal/utf8.c Outdated Show resolved Hide resolved
@lewing
Copy link
Member

lewing commented Jul 18, 2023

The test failures here are known

ep_char16_t* lpDestStr = NULL;

if (static_cast<int>(len) < 0)
len = strlen(str) + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the +1 here? Adding the null terminator is handled explicitly below?

Copy link
Member

@am11 am11 Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing pattern:

if (len < 0)
len = (glong)strlen(str) + 1;
glong ret = (glong)minipal_get_length_utf8_to_utf16 (str, len, flags);

Reason is, minipal get_length API needs the source length with null-terminator accounted for, so it returns the correct destination count. Handling it within minipal was requiring some additional conditions which we decided to leave out for the caller to handle.

Copy link
Member

@elinor-fung elinor-fung Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the length in ep_rt_utf16_to_utf8_string need +1 too, then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing pattern:

It has the same issue of allocating one extra byte than necessary, and it is further complicated by returning items_written.

We do not need to replicate the issue here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas, not sure which issue you are referring to. The easiest way to understand what goes wrong without +1 is to remove it from coreclr version (it's on the callsite of PAL API) and run PAL tests. fb613a5 did the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas, not sure which issue you are referring to.

Allocating buffer that is one character longer than necessary.

For example, if ep_rt_utf8_to_utf16le_string is called with a string that is 1 ascii character + zero terminator, the current code allocates 3 * sizeof(CHAR16_T). It is one CHAR16_T more than what is actually needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the pattern with the extra +1 that @am11 mentions is needed to preserve the current behavior. For example, this test fails without +1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that the event pipe code expects the strings returned by this method to have double zero terminators? It is very atypical contract. If it is really the case, these should be a long comment explaining it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating buffer that is one character longer than necessary.

PAL, COM and reflection tests are very sensitive about "fill the exact slot or fail" cases. So if that was the case, they all will fail. IOW, that +1 effect gets cancelled out how the code in utf8.c works.

src/coreclr/nativeaot/Runtime/eventpipe/ep-rt-aot.h Outdated Show resolved Hide resolved
src/coreclr/nativeaot/Runtime/eventpipe/ep-rt-aot.h Outdated Show resolved Hide resolved

ep_char8_t *str_utf8 = reinterpret_cast<ep_char8_t *>(malloc ((len_utf16 + 1) * sizeof (ep_char8_t)));
if (!str_utf8)
ep_char8_t* lpDestStr = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to str_utf8 and move the declaration down to where it is first assigned.

src/coreclr/nativeaot/Runtime/eventpipe/ep-rt-aot.h Outdated Show resolved Hide resolved
@LakshanF LakshanF merged commit c0d7d2d into dotnet:main Jul 21, 2023
177 checks passed
@LakshanF LakshanF deleted the UtfConversions branch July 21, 2023 18:46
@ghost ghost locked as resolved and limited conversation to collaborators Aug 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants