Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][aot] Fix support for files with non-ascii characters on windows #92279

Merged
merged 2 commits into from
Jan 31, 2024

Conversation

BrzVlad
Copy link
Member

@BrzVlad BrzVlad commented Sep 19, 2023

Add g_fopen, g_unlink and g_rename which on windows do a utf8 to utf16 conversion and then call the corresponding wide char api.

Contributes to #83203

@ghost ghost assigned BrzVlad Sep 19, 2023
@BrzVlad BrzVlad changed the title [mono][aot] Fix support for files with non-ascii characters [mono][aot] Fix support for files with non-ascii characters on windows Sep 19, 2023
@BrzVlad
Copy link
Member Author

BrzVlad commented Sep 19, 2023

cc @fanyang-mono. Shamelessly stole some code from #90436.

FILE *fp;

#ifdef HOST_WIN32
gunichar2 *wPath = g_utf8_to_utf16 (path, -1, 0, 0, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor optimization that Johan suggested and I wasn't able to do it correctly is that if path only contains ascii characters, it could simply use fopen. I tried adding the optimization here. But it failed the test on CI. I haven't figured out how to test it locally. So I took it out. If you could get it right, that would be great!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this optimization having any impact in the real world, especially since we are dealing with file access which is bound to be quite slow. Unless the encoding conversion is craazy slow, which I doubt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have any real world experience with the relevant API's. By looking at the code, we could avoid two calculations of g_utf8_to_utf16.
@lateralusX Do you have any insights here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not done any measurements on the speed of encoding conversions on any platform and in relation to file IO its probably a small amount of time. The idea was to avoid doing the heap allocation and encoding conversion when not really needed (all characters in ascii format so standard function can be called) and since we have the logic put into a couple of g_ functions, the optimization will be isolated, simple and straightforward. I still think avoiding string encoding conversions and heap allocations when not needed is worthwhile, but I agree that in this case the frequency of calls coming into these functions will be low.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lateralusX I added the ascii string check to avoid utf16 conversion. Could you do a final review on this PR ?

@SamMonoRT
Copy link
Member

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Add g_fopen, g_unlink and g_rename which on windows do a utf8 to utf16 conversion and then call the corresponding wide char api.
@BrzVlad BrzVlad merged commit 3bf9bee into dotnet:main Jan 31, 2024
108 of 111 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants