Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter Notebook /files/ endpoint doesn't correctly send UTF-8 text files #2397

Closed
digitalresistor opened this issue Apr 11, 2017 · 9 comments · Fixed by #2402
Closed

Jupyter Notebook /files/ endpoint doesn't correctly send UTF-8 text files #2397

digitalresistor opened this issue Apr 11, 2017 · 9 comments · Fixed by #2402

Comments

@digitalresistor
Copy link

I have various log files that are generated by an outside process that are accessible by the Jupyter notebook. We would like users to be able to use these log files to verify that their actions are correct, however as soon as they click on them the file is downloaded as ASCII text/plain, which causes the fancy quotes to be replaced with:

from:

‘text’

to

‘text’

Is there some way to force Jupyter to correctly identify files ending with .log as UTF-8 plain text files?

@takluyver
Copy link
Member

There's nothing wrong with text/plain for UTF-8. It's the browser that chooses how to display it, and in that case it looks like it's using latin-1 or cp1252. We could do encoding detection and tell the browser to use a specific encoding with the HTTP Content-Type header, but it's basically just educated guessing, and I'm not sure it's worth adding dependencies for this.

@digitalresistor
Copy link
Author

Sending text/plain according to the HTTP spec means the browser should treat it as ISO-8859-1 due to the HTTP standard specifying that is the default encoding.

I would argue that sending text/plain; charset=UTF-8 is better than text/plain for files Jupyter doesn't recognise, as ISO-8859-1 documents will mostly render correctly as UTF-8, and browsers already fall back to ISO-8859-1 for files that don't correctly decode as UTF-8.


Another option would be to check the LANG and LC_ALL environment variables and based upon the setting using that as the default charset returned in the Content-Type header.

@gnestor
Copy link
Contributor

gnestor commented Apr 12, 2017

@bertjwregeer I'm looking for where this change may need to happen. Does this look right?

diff --git a/notebook/files/handlers.py b/notebook/files/handlers.py
index c54151125..be3197ce7 100644
--- a/notebook/files/handlers.py
+++ b/notebook/files/handlers.py
@@ -52,7 +52,7 @@ class FilesHandler(IPythonHandler):
                if model['format'] == 'base64':
                    self.set_header('Content-Type', 'application/octet-stream')
                else:
                    self.set_header('Content-Type', [-'text/plain')-]{+'text/plain; charset=UTF-8')+}

        if include_body:
            if model['format'] == 'base64':

@digitalresistor
Copy link
Author

The diff format you used is not one I recognise, but yes, changing that header should work to change the default.

@gnestor
Copy link
Contributor

gnestor commented Apr 12, 2017

@bertjwregeer Ya I'm using the git-plus Atom extension ¯_(ツ)_/¯

I submitted a PR, can you check it out and let me know if it resolves your issue?

git fetch jupyter pull/2402/head:pr/2402
git checkout pr/2402

@digitalresistor
Copy link
Author

@gnestor will pull it down and check it out, I'll get back to you :-).

@gnestor
Copy link
Contributor

gnestor commented Apr 20, 2017

@bertjwregeer Any luck with this PR?

@digitalresistor
Copy link
Author

I have been traveling, will verify later today.

@digitalresistor
Copy link
Author

Sorry for the late reply. This fixes the issue for me :-)

@takluyver takluyver added this to the 5.1 milestone Jul 18, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants