-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/sys/windows/mkwinsyscall: support UTF16 functions not ending with W #51786
Comments
What if we just unconditionally change it to default to utf16? Or perhaps always require either The reason I suggest this is that I'd hate for the behavior of mkwinsyscall to become, "it does the old bad thing unless you remember to pass this special switch." I'd rather it be "it does the new good thing, unless you tell it to do the old bad thing" or "you have to tell it which thing you want to do, so that it won't do the old bad thing without you telling it to." I suspect changing it to default to utf16 would make most sense, considering most if not all functions are already wchar. |
I don't see using I suggest you copy
That will break some users code. I suggested a reasonable workaround for @qmuntal problem. Why breaking users code is better than my workaround?
That will definitely break users code. And your proposed flags will force current users split their code into 2 different source files: ascii version and utf16 version. I am not convinced this is a problem that we need to solve. Yet. But feel free to ignore my opinion. Alex |
windows.UTF16PtrFromString returns a pointer to the UTF16 string and an error, so we are forcing users to check that error before calling the autogenerated function, whereas if the autogenerated function would do the conversion this boilerplate code could be eliminated.
I agree with your statement about not hiding allocations, but this proposal don't preclude Having low memory allocation is primordial when using BCrypt library, but for my use case functions accepting UTF16 strings are not supposed to be called multiple times with the same string, so it makes no sense to cache the windows.UTF16PtrFromString result in order to reduce allocations. Passing the string directly to the autogenerated function would not have any performance impact, but would be more ergonomic.
If you think this proposal covers a niche use case, I'm fine patching mkwinsyscall. But, as @zx2c4 has also mentioned, UTF16 strings are the standard for new Windows APIs, and it would be a pity to lose the capacity of passing |
Yes. If you don't care about allocations, then you can save your users from writing boilerplate code.
Agree.
Thanks for explaining.
I do think that. Maybe my opinion will change in the future.
You would have to copy mkwinsyscall program and make small changes to it. If you find this troublesome, I am happy to reconsider. I did exactly that in many of my personal projects. I do not regret making that decision after many years. I also happy to reconsider if other projects find this functionality useful. I am just trying to defer breaking users code till later. Alex |
Change https://go.dev/cl/425054 mentions this issue: |
Using utf16.AppendRune instead of utf16.Encode safe a bunch of allocations across the board, as many higher level functions use it to call Windows syscalls, for example to `os` package: name old alloc/op new alloc/op delta Readdirname-12 15.6kB ± 0% 15.6kB ± 0% +0.26% (p=0.008 n=5+5) Readdir-12 29.4kB ± 0% 29.4kB ± 0% +0.14% (p=0.008 n=5+5) ReadDir-12 29.4kB ± 0% 29.4kB ± 0% +0.14% (p=0.016 n=4+5) StatDot-12 552B ± 0% 560B ± 0% +1.45% (p=0.008 n=5+5) StatFile-12 512B ± 0% 336B ± 0% -34.38% (p=0.008 n=5+5) StatDir-12 432B ± 0% 288B ± 0% -33.33% (p=0.008 n=5+5) LstatDot-12 552B ± 0% 560B ± 0% +1.45% (p=0.008 n=5+5) LstatFile-12 512B ± 0% 336B ± 0% -34.38% (p=0.008 n=5+5) LstatDir-12 432B ± 0% 288B ± 0% -33.33% (p=0.008 n=5+5) StatFile-12 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.008 n=5+5) StatDir-12 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.008 n=5+5) LstatFile-12 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.008 n=5+5) LstatDir-12 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.008 n=5+5) Updates #51786 Change-Id: I0a088cf1a96e9c304da9311bb3895b70443c1637 Reviewed-on: https://go-review.googlesource.com/c/go/+/425054 Reviewed-by: Tobias Klauser <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: David Chase <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]>
Background
Microsoft normally provides two parallel sets of APIs, one for ANSI strings, whose function names end with
A
, and the other for Unicode strings, whose function names end withW
(source).This pattern is well supported in
x/sys/windows/mkwinsyscall
: function parameters of typestring
are transformed usingsyscall.UTF16PtrFromString
if the function names ends withW
, and usingsyscall.BytePtrFromString
otherwise:https://github.com/golang/sys/blob/2edf467146b5fc89e484991587e3032c8421ae8c/windows/mkwinsyscall/mkwinsyscall.go#L616-L622
Problem
Newer libraries that were created after Windows introduced Unicode tend to provide just the UTF16 API set without the
W
-ending convention. In my particular case, I'm creating bindings for the BCrypt library, in which all functions accepting strings expect them to be UTF16 encoded even when the function name does not end withW
.This forces me to define the //sys functions using
*uint16
instead ofstring
and to do the conversion outside the autogenerated code.Request
Add a knob to
x/sys/windows/mkwinsyscall
to decide if a string parameter should be transformed to ASCII or Unicode.Two complementary solutions come to mind:
Add a cli flag, i.e.
-utf16
, which changes the code generation behavior fromeverything is ASCII unless the function name ends with W
to
everything is UTF16 unless the function name ends with A.
Add a new optional condition to each function definition, i.e.
[encoding=utf16/ascii]
, which will override the encoding selected from the function name or the cli flag (if finally added).The first solution will probably be enough to cover all cases, as new Windows APIs are Unicode by default and only legacy APIs expect ASCII strings, in which case they already provide the both API sets.
cc @zx2c4
The text was updated successfully, but these errors were encountered: