A file open with auto-detected encoding. #10013

tomoki1207 · 2016-08-01T09:59:05Z

This related to #5388.

A text file will opened with detected encoding by jschardet.

msftclas · 2016-08-01T09:59:09Z

Hi @tomoki1207, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

msftclas · 2016-08-01T23:15:09Z

@tomoki1207, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, MSBOT;

bpasero · 2016-08-06T10:04:21Z

@tomoki1207 I am not sure this works the way you coded it because the encoding is a user setting and you always try to detect the encoding now. How can you still respect the user preference if the encoding is not clear?

My argument is that really the only way of detecting an encoding is by looking at the BOM (Byte Order Mark) for UTF (and we do this already). Any other file encoding can only be guessed.

I think one thing we could add is an action in the encoding picker to "Auto Detect" the encoding via this code and then set the encoding for the file. But always detecting the encoding for each file being opened is not right imho.

tomoki1207 · 2016-08-09T03:15:48Z

@bpasero I understand your opinion.
However, I think many people are feeling a lot of inconvenience to open small files that are not encoded in UTF.
So, I hope to be detected automatically the encoding in some way.

Do you better the following approach? Just like Atom auto detect package.

Prepare SetEncoding API for Extension
Call it from auto-detect extension

bpasero · 2016-08-09T04:59:13Z

@tomoki1207 the approach works if every encoding can be detected with 100% certainty but I doubt that is possible for any file that does not include a BOM. What does jschardet do if the encoding is ambiguous?

Nevertheless we do have a global and workspace setting for the encoding that we cannot just drop, so I see little chance of changing this to always auto detect the encoding. The only possible thing I see is to offer an action to "Guess Encoding" from the encoding picker that executes the jschardet. I believe Atom does the same.

bpasero · 2016-08-23T08:51:20Z

Closing for inactivity.

detect encoding

4096245

msftclas added the cla-required label Aug 1, 2016

msftclas added cla-signed and removed cla-required labels Aug 1, 2016

Add jschardet to shrinkwrap

439ee8c

kieferrm assigned bpasero Aug 2, 2016

bpasero added this to the Backlog milestone Aug 6, 2016

bpasero closed this Aug 23, 2016

bpasero removed their assignment Aug 23, 2016

tomoki1207 mentioned this pull request Oct 3, 2016

Show message when detected encoding of file is non UTF #13142

Closed

katainaka0503 mentioned this pull request Feb 25, 2017

Auto guess encoding #21416

Merged

buzzzzer mentioned this pull request Apr 11, 2017

Test: encoding auto detection #23322

Closed

3 tasks

github-actions bot locked and limited conversation to collaborators Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A file open with auto-detected encoding. #10013

A file open with auto-detected encoding. #10013

tomoki1207 commented Aug 1, 2016 •

edited

Loading

msftclas commented Aug 1, 2016

msftclas commented Aug 1, 2016

bpasero commented Aug 6, 2016

tomoki1207 commented Aug 9, 2016

bpasero commented Aug 9, 2016

bpasero commented Aug 23, 2016

A file open with auto-detected encoding. #10013

A file open with auto-detected encoding. #10013

Conversation

tomoki1207 commented Aug 1, 2016 • edited Loading

msftclas commented Aug 1, 2016

msftclas commented Aug 1, 2016

bpasero commented Aug 6, 2016

tomoki1207 commented Aug 9, 2016

bpasero commented Aug 9, 2016

bpasero commented Aug 23, 2016

tomoki1207 commented Aug 1, 2016 •

edited

Loading