Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<system_error>: FormatMessageA needs be used in a different way #2451

Closed
snnn opened this issue Jan 1, 2022 · 8 comments · Fixed by #2669
Closed

<system_error>: FormatMessageA needs be used in a different way #2451

snnn opened this issue Jan 1, 2022 · 8 comments · Fixed by #2669
Labels
bug Something isn't working fixed Something works now, yay!

Comments

@snnn
Copy link
Member

snnn commented Jan 1, 2022

Describe the bug

Windows settings have “user language id” and “system language id”. They can be different. std::system_category().message() returns "???" when they are different.

By default FormatMessage returns a string in “user language”. The Unicode version of FormatMessage works good in all cases. But the ANSI version, like all other ANSI version Windows APIs, would try to convert the result string to a multibyte string based on “system language id”. It goes wrong when the two ids are different. For example, If your user language is Chinese but the system language is English, it can only returns “????”, because you can’t encode a Chinese string in ISO-8859-1.

So if you need to return the error message in std::string type, there are two solutions:

  1. Pass the system language id to FormatMessageA
  2. If you use FormatMessageW, you can set the language id to the system language or thread ACP(the user language). Then when you use WideCharToMultiByte, use CP_ACP for the system language and CP_THREAD_CP for the user language.

Command-line test case

int main() {
    setlocale(LC_ALL, "");
	LANGID user_language_id = GetUserDefaultLangID();
	WCHAR buf[1024];
	int ret = GetUserDefaultLocaleName(buf,sizeof(buf)/sizeof(buf[0]));
	assert(ret != 0);
	std::wcout << user_language_id << L" " << buf << "\n";
	ret = GetSystemDefaultLocaleName(buf, sizeof(buf) / sizeof(buf[0]));
	assert(ret != 0);
	LANGID system_language_id = GetSystemDefaultLangID();
	std::wcout << system_language_id << L" " << buf << std::endl;
    
    wil::unique_hfile file_handle(CreateFile(L"D:\\non_exist_file.txt", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL));
    if (file_handle.get() == INVALID_HANDLE_VALUE) {
        const auto error_code = GetLastError();
        std::string errmsg = std::system_category().message(error_code); //Sometimes the string only contains "???"
    }
  return 0;
}

To compile it:
C:\Temp>cl /EHsc .\repro.cpp

To reproduce the error:
I assume you have a clean Windows installation. Then please create a new user without Administrator privilege. Then login to the user, go to settings and change the display language to something different. Like, Chinese. In such a case, you only changed the user's locale, not the system. Because the user doesn't have the privilege to change system level settings.

Then run the code, use your debugger to inspect the binary content of the error message. Do not print it out. Printing is a different problem. Just look the raw bytes.

Expected behavior
The error message should contain useful information.

STL version
VS 2022 17.0.3

Additional context
Add any other context about the problem here.

@sylveon
Copy link
Contributor

sylveon commented Jan 1, 2022

The behavior previously to #457 called FormatMessageW with no explicit locale, then converted using CP_ACP, so I wouldn't be opposed to pass the system language ID to FormatMessageA. This might throw off people expecting the user language ID to be used, but that was always a broken/poorly supported scenario (since the issue you raised was present even before #457).

@snnn
Copy link
Member Author

snnn commented Jan 3, 2022

Why the ANSI version of FormatMessage was preferred? I'm trying to understand which encoding std::string should use. For example, if you pass such a string to fputs or std::cout. Based on my test, it should be in system ACP, not user ACP. And it would be better if we can totally avoid using non-unicode functions. Because some languages even don't have a code page.

@sylveon
Copy link
Contributor

sylveon commented Jan 3, 2022

std::string is a requirement, using anything else would be non-standard. I preferred FormatMessageA because it gives us a result in char (which is what we need for std::string) so it's less work for us to do.

If std::cout and fputs expect the system ACP, then it seems even more obvious that we should pass the system language ID to FormatMessageA. Calling FormatMessageW with no language ID and then converting to the system ACP would present the same exact problem where you get a string in the user ACP and then attempt to convert it to system ACP.

So you end up calling FormatMessageW with the system language ID and then converting to char with the system ACP. But... that's exactly what FormatMessageA does when you pass it the system language ID :)

@StephanTLavavej StephanTLavavej added the bug Something isn't working label Jan 12, 2022
@fsb4000
Copy link
Contributor

fsb4000 commented Apr 24, 2022

Then run the code, use your debugger to inspect the binary content of the error message. Do not print it out. Printing is a different problem. Just look the raw bytes.

Expected behavior
The error message should contain useful information.

I tested with System Language: Russian and User Language - Chinese.
Yes, the debugger shows '?' but printing is ok.

изображение

изображение

So is this a debugger visualizer issue?

@snnn
Copy link
Member Author

snnn commented Apr 24, 2022

No, it's not .

image

@snnn
Copy link
Member Author

snnn commented Apr 24, 2022

I reached the point by setting Windows display language to Chinese for the current user.

image

This setting is per user. If I click the "Administrative Language Settings" at the bottom, it will pop up a traditional dialog
image

Most Windows users are very familiar with the later one. Most important, as the dialog shows the system level local is still "English (US)". The mismatch will cause problems for who uses ANSI APIs. Because these APIs do not have user-level locale information, they use the system-level locale setting. But the FormatMessage function is different. It has a "dwLanguageId" parameter for such a purpose. If you pass in zero, it will prefer user language settings to system language settings. See: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessage

@sylveon
Copy link
Contributor

sylveon commented Apr 24, 2022

If we update this function to use the user locale, then it becomes the "odd one out". Every other standard function that takes char will call A APIs, so you'll be passing them essentially garbage. As you've pointed out yourself, std::cout and fputs expects content encoded in the system ACP. We can't return std::wstring either, as the standard mandates usage of std::string.

So I think the best course of action here is to explicitly pass the system locale ID (which we can obtain by calling GetSystemDefaultLangID) to FormatMessage instead of 0.

@fsb4000
Copy link
Contributor

fsb4000 commented Apr 24, 2022

If you pass in zero, it will prefer user language settings to system language settings.

yes, but not always.

изображение
изображение

When I had had such settings: (User Interface- Russian, Region: China), I had System Language: Russian and User Language - Chinese, and FormatMessageA still preferred System Language.

Anyway I think explicitly pass GetSystemDefaultLangID is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Something works now, yay!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants