Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<format> assumes strings are encoded in the active code page #1834

Merged
merged 7 commits into from
Apr 20, 2021

Conversation

CaseyCarter
Copy link
Member

Refactors all format functionality with knowledge of character encodings into a new class _Fmt_codec. _Fmt_codec internally caches the _Cvtvec structure that other functions previously passed around, and makes use of a new "pure C ABI function" __std_get_cvt to retrieve character conversion info for the active codepage.

The ABI function reuses the __std_code_page and __std_win_error types from <xfilesystem_abi.h>. These could be extracted into a more general __msvc_win_abi.hpp header, but I think <xfilesystem_abi.h> is lightweight enough to just include directly for now.

Drive-by:

  • BUG: _Parse_precision was not skipping the trailing } in a dynamic precision.
  • Improvement: test_parse_helper now verifies that the parse consumes the entire input when passed an expected length of npos.

@CaseyCarter CaseyCarter added cxx20 C++20 feature format C++20/23 format labels Apr 14, 2021
@CaseyCarter CaseyCarter requested a review from a team as a code owner April 14, 2021 03:06
stl/inc/format Outdated Show resolved Hide resolved
stl/inc/format Show resolved Hide resolved
stl/inc/format Outdated Show resolved Hide resolved
Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that you cooked that up in a few days makes my imposter syndrome super happy

stl/src/format.cpp Show resolved Hide resolved
stl/CMakeLists.txt Show resolved Hide resolved
stl/src/format.cpp Outdated Show resolved Hide resolved
stl/inc/format Outdated Show resolved Hide resolved
Copy link
Member

@barcharcraz barcharcraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, although the hack to hook into format.cpp to change the codepage needs to change.

I do observe that the ucrt uses the _setmbcp and _getmbcp functions, which seem to set a global (not great).

stl/src/format.cpp Show resolved Hide resolved
stl/inc/format Outdated Show resolved Hide resolved
stl/inc/format Show resolved Hide resolved
stl/inc/format Show resolved Hide resolved
stl/inc/format Show resolved Hide resolved
stl/inc/format Outdated Show resolved Hide resolved
stl/inc/format Show resolved Hide resolved
stl/inc/format Outdated Show resolved Hide resolved
tests/std/tests/P0645R10_text_formatting_utf8/test.cpp Outdated Show resolved Hide resolved
@CaseyCarter
Copy link
Member Author

Looks pretty good to me, although the hack to hook into format.cpp to change the codepage needs to change.

No lie: this looked even jankier to me today than it did yesterday. I won't mourn its passing.

stl/inc/format Outdated Show resolved Hide resolved
Refactors all `format` functionality with knowledge of character encodings into a new class `_Fmt_codec`. `_Fmt_codec` internally caches the `_Cvtvec` structure that other functions previously passed around, and makes use of a new "pure C ABI function" `__std_get_cvt` to retrieve character conversion info for the active codepage.

The ABI function reuses the `__std_code_page` and `__std_win_error` types from `<xfilesystem_abi.h>`. These could be extracted into a more general `__msvc_win_abi.hpp` header, but I think `<xfilesystem_abi.h>` is lightweight enough to just include directly for now.

Drive-by:
* BUG: `_Parse_precision` was not skipping the trailing `}` in a dynamic precision.
* Improvement: `test_parse_helper` now verifies that the parse consumes the entire input when passed an expected length of `npos`.
@StephanTLavavej
Copy link
Member

Now that #1821 has merged, I've force-pushed and retargeted this to main. I inspected the history carefully - acp had accumulated 3 commits beyond feature/format. Because we took feature/format, added more commits to it in merging_format, and then squashed everything into main (squashing is awesome for linear history and I wouldn't give it up for anything, but it means that situations like this need extra care), I determined that the easiest and safest thing to do was to start from main and cherry-pick acp's 3 unique commits. This cherry-picked 99.9% cleanly, with the only conflict being this PR adding inclusion of <xfilesystem_abi.h> next to the removal of <variant>, which was trivial to resolve.

Apologies for the force-push! 😸

@CaseyCarter CaseyCarter changed the title format assumes strings are encoded in the active code page <format> assumes strings are encoded in the active code page Apr 15, 2021
stl/inc/format Outdated Show resolved Hide resolved
stl/inc/format Outdated Show resolved Hide resolved
... fix 2 bugs exposed by that coverage, and address review comments. Drive-by: Remove the `_FORMAT_CODEPAGE` override from the `formatting_utf8` test which is unnecessary since we compile it with `/utf-8`.
@StephanTLavavej StephanTLavavej merged commit ccc5aaa into microsoft:main Apr 20, 2021
@StephanTLavavej
Copy link
Member

Thanks for fixing this bug! 😻 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cxx20 C++20 feature format C++20/23 format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants