Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config "outputLogEncoding" #550

Merged
merged 2 commits into from
Oct 9, 2018
Merged

Add config "outputLogEncoding" #550

merged 2 commits into from
Oct 9, 2018

Conversation

pengbins
Copy link
Contributor

@pengbins pengbins commented Sep 27, 2018

This change try to fix a codepage problem mentioned in some issues, such as #518 , #267 , #245 and #153.
The output messages of cmake build command are garbled when the default system codepage is not utf8.

It is fixed by adding a user config item "outputEncoding" to set the character encoding of messages in the output console.

  • How to use

    • Set "outputEncoding" to your system default codepage. (Ctrl+, and search outputEncoding)
      image
    • The default value is "utf8". For Chinese user, set it to "GBK".
    • Supported value list can be found here .
  • Results

    • Output console used to be unreadable when the default system codepage is not utf8.
      image
    • Output console is readable after set "outputEncoding" to GBK.
      image
  • How it works
    This change introduces a new dependency iconv-lite.
    The console programs emit text encoded with the default system codepage. This change uses iconv-lite to decode those text to utf8 , so they can be displayed correctly on output console. To decode, iconv-lite
    needs to know the default system codepage which is given by user's config "outputEncoding".

Add a config item "outputEncoding" to
set the encoding of messages from cmake's
build command which are displayed in the
output console.
@vector-of-bool
Copy link
Contributor

This change is excellent!!

The output codepage issues has been persistently troublesome almost from the very beginning. I've been unable to test and was never sure how to approach this problem.

One think I am wondering is if there is the possibility to auto-detect the code page. I found that Windows exposes a _CODEPAGE environment variable for the current codepage number. I've found a table of these numbers to their names, but I don't think the names that microsoft provides will work with iconv-lite.

Instead of trying to get them all down immediately, could you add a lookup that uses the _CODEPAGE value to default to your GBK when _CODEPAGE is the value you have on your system? As users report this issue I can add entries to this table so that people won't have to manually change their encoding.

@vector-of-bool
Copy link
Contributor

Alternatively, you could try using this table I made from codepage number to names. Maybe iconv just supports it out-of-the box? That would be awesome, and changing the output encoding setting wouldn't ever be required.

code-pages.ts.txt

@pengbins
Copy link
Contributor Author

pengbins commented Oct 8, 2018

I found that Windows exposes a _CODEPAGE environment variable for the current codepage number.

It seems "_CODEPAGE" is not exposed on my Windows 10.

$ > echo %_CODEPAGE%
%_CODEPAGE%

$ >  set | find "CODEPAGE"
// no result

But we can get current active code page by run cmd 'chcp' with no parameter.

$ >chcp /?
Displays or sets the active code page number.

CHCP [nnn]

  nnn   Specifies a code page number.

Type CHCP without a parameter to display the active code page number.

$ >chcp
Active code page: 65001

Maybe we can try to run 'chcp', extract the codepage number, and translate it to encoding name for iconv-lite.

code-pages.ts.txt

PS. Is this table generated from this page.

1. Change the default value of "outputEncoding" to "auto"
2. When "outputEncoding" is "auto" and platform is Windows,
auto detect the default codepage of system by run cmd chcp.
3. Translate codepage number to encode name by CodePageTable.
4. If auto detect failed, user can still set this config manually.
@vector-of-bool
Copy link
Contributor

It seems "_CODEPAGE" is not exposed on my Windows 10.

That's annoying.

This looks good! I may do some refactors after merging it, but it won't be a big hastle.

Last question: Can you verify that the auto-detection works on your system?

@pengbins
Copy link
Contributor Author

pengbins commented Oct 9, 2018

Last question: Can you verify that the auto-detection works on your system?

Yes, it works on my system. But I didn't test on other language system except mine.
My code page number is 936, encoding name is gb2312.

@vector-of-bool vector-of-bool merged commit a4454bd into microsoft:develop Oct 9, 2018
@pengbins pengbins deleted the outputEncoding branch October 10, 2018 02:03
@MBurtsev
Copy link

Problem is still here

@no-realm
Copy link
Contributor

@Nim Can you please open a separate issue for this. It's easier to track that way :)

@MBurtsev
Copy link

microsoft/vscode#72270

@github-actions github-actions bot locked and limited conversation to collaborators Jan 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants