-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to UTF-8 source encoding (PEP8: python3) #399
Conversation
# In Python2, the default source encoding is US-ASCII | ||
# In Python3, the default source encoding is UTF-8 | ||
# http://legacy.python.org/dev/peps/pep-0008/#source-file-encoding | ||
default_encoding = 'US-ASCII' if PY2 else 'UTF-8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will now happen when scanning a Python file that was encoded using latin-1 but without the encoding declared? Success if strings are ASCII, encoding error if something like 'ë' is encountered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the file is ascii it will succeed in both python 2 and python 3. Latin1 is a superset of ascii.
If a file has a latin1 encoded character and does not have an encoding pragma it will be a syntax error in both python2 or python3 (without even factoring in babel)
If the file has a glyph that is latin1 representable (such as the one presented) that is encoded as utf8 and does not have an encoding pragma:
- python2: syntax error
- python3: this is fine (the default source encoding is utf8)
Changes look good to me. I have no idea why codecov croaked. @akx ? |
forgot to import pytest, maybe that's why it croaked /me rebases |
This is technically fine as a patch -- thanks @asottile -- but I'm wondering... Should we maybe default to UTF-8 on both Py2 and Py3? I kind of dislike the idea that Babel acts differently depending on the platform it's being run on, especially considering all of the other compat stuff we have that makes it work equivalently on both! I kinda do think it'd be better to assume UTF-8 on every .py file regardless of interpreter version, unless told otherwise. Is there something I'm missing with that approach? |
I thought about that as well too, the only issue I have with that is it would cause babel to potentially scan files which are syntactically incorrect -- not sure how much of an issue that is though. In most tools I've written I've defaulted to UTF-8 in both to simplify code so maybe it's OK here too? |
I think it's okay. Really, as an user of Babel, if you have a source file that
then you have bigger problems already. 😁 |
@akx now defaulting to UTF-8 everywhere :) |
Current coverage is 88.78%@@ master #399 diff @@
========================================
Files 24 24
Lines 3950 3950
Methods 0 0
Branches 0 0
========================================
- Hits 3561 3507 -54
- Misses 389 443 +54
Partials 0 0
|
The following python3 source file:
has UTF-8 encoding (by default without a pragma). The code I touched (before my change) caused this to be scanned as latin-1 (mojibake)
The section of PEP8 which covers this: http://legacy.python.org/dev/peps/pep-0008/#source-file-encoding