Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A file open with auto-detected encoding. #10013

Closed
wants to merge 2 commits into from
Closed

A file open with auto-detected encoding. #10013

wants to merge 2 commits into from

Conversation

tomoki1207
Copy link
Contributor

@tomoki1207 tomoki1207 commented Aug 1, 2016

This related to #5388.

A text file will opened with detected encoding by jschardet.

@msftclas
Copy link

msftclas commented Aug 1, 2016

Hi @tomoki1207, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

@msftclas
Copy link

msftclas commented Aug 1, 2016

@tomoki1207, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, MSBOT;

@bpasero
Copy link
Member

bpasero commented Aug 6, 2016

@tomoki1207 I am not sure this works the way you coded it because the encoding is a user setting and you always try to detect the encoding now. How can you still respect the user preference if the encoding is not clear?

My argument is that really the only way of detecting an encoding is by looking at the BOM (Byte Order Mark) for UTF (and we do this already). Any other file encoding can only be guessed.

I think one thing we could add is an action in the encoding picker to "Auto Detect" the encoding via this code and then set the encoding for the file. But always detecting the encoding for each file being opened is not right imho.

@bpasero bpasero added this to the Backlog milestone Aug 6, 2016
@tomoki1207
Copy link
Contributor Author

@bpasero I understand your opinion.
However, I think many people are feeling a lot of inconvenience to open small files that are not encoded in UTF.
So, I hope to be detected automatically the encoding in some way.

Do you better the following approach? Just like Atom auto detect package.

  1. Prepare SetEncoding API for Extension
  2. Call it from auto-detect extension

@bpasero
Copy link
Member

bpasero commented Aug 9, 2016

@tomoki1207 the approach works if every encoding can be detected with 100% certainty but I doubt that is possible for any file that does not include a BOM. What does jschardet do if the encoding is ambiguous?

Nevertheless we do have a global and workspace setting for the encoding that we cannot just drop, so I see little chance of changing this to always auto detect the encoding. The only possible thing I see is to offer an action to "Guess Encoding" from the encoding picker that executes the jschardet. I believe Atom does the same.

@bpasero
Copy link
Member

bpasero commented Aug 23, 2016

Closing for inactivity.

@bpasero bpasero closed this Aug 23, 2016
@bpasero bpasero removed their assignment Aug 23, 2016
@buzzzzer buzzzzer mentioned this pull request Apr 11, 2017
3 tasks
@github-actions github-actions bot locked and limited conversation to collaborators Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants