-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[io] Windows legacy mode mistakenly ignores the device encoding #86427
Comments
In Python 3.8+, legacy standard I/O mode uses the process code page from GetACP instead of the correct device encoding from GetConsoleCP and GetConsoleOutputCP. For example:
This is based on config_init_stdio_encoding() in Python/initconfig.c, which sets config->stdio_encoding via config_get_locale_encoding(). Cannot config->stdio_encoding be set to NULL for default behavior? Computing this ahead of time would require separate encodings config->stdin_encoding, config->stdout_encoding, and config->stderr_encoding. And _Py_device_encoding would have to be duplicated as something like config_get_device_encoding(PyConfig *config, int fd, wchar_t **device_encoding). |
There's a related issue that affects opening duplicated file descriptors and opening "CON", "CONIN$", and "CONOUT$" in legacy I/O mode, but this case has always been broken. For Windows, _Py_device_encoding needs to be generalized to use _get_osfhandle and GetNumberOfConsoleInputEvents to detect and differentiate console input and output, instead of using isatty() and hard coding file descriptors 0-2. |
I would like to get a PyConfig structure fully populated to make the Python initialization more deterministic and reliable. So PyConfig fully control used encodings. The solution here is to fix config_init_stdio_encoding() to use GetConsoleCP() and GetConsoleOutputCP() to build a "cpXXX" string. This issue seems to be a regression that I introduced in Python 3.8 with the PEP-587 (PyConfig). I didn't notice this subtle case during my refactoring. Relying on os.device_encoding() when the encoding is NULL is not obvious. That's why I prefer to get PyConfig full populated ;-) It would be nice to get an unit test for this case. |
But, as I mentioned, that's only possible by replacing config->stdio_encoding with three separate settings: config->stdin_encoding, config->stdout_encoding, and config->stderr_encoding. |
The process code page from GetACP() is either an ANSI code page or CP_UTF8 (65001). It should never be a Western OEM code page such as 850. In that case, a reliable unit test would check that the configured encoding is a particular OEM code page. For example, spawn a new interpreter in a windowless console session (i.e. creationflags=CREATE_NO_WINDOW). Set the session's input code page to 850 via ctypes.WinDLL('kernel32').SetConsoleCP(850). Set os.environ['PYTHONLEGACYWINDOWSSTDIO'] = '1'. Then spawn [sys.executable, '-c', 'import sys; print(sys.stdin.encoding)'], and verify that the output is 'cp850'. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: