Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to UTF-8 source encoding (PEP8: python3) #399

Merged
merged 1 commit into from
Apr 28, 2016
Merged

Default to UTF-8 source encoding (PEP8: python3) #399

merged 1 commit into from
Apr 28, 2016

Conversation

asottile
Copy link
Contributor

The following python3 source file:

print(_('☃'))

has UTF-8 encoding (by default without a pragma). The code I touched (before my change) caused this to be scanned as latin-1 (mojibake)

The section of PEP8 which covers this: http://legacy.python.org/dev/peps/pep-0008/#source-file-encoding

# In Python2, the default source encoding is US-ASCII
# In Python3, the default source encoding is UTF-8
# http://legacy.python.org/dev/peps/pep-0008/#source-file-encoding
default_encoding = 'US-ASCII' if PY2 else 'UTF-8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will now happen when scanning a Python file that was encoded using latin-1 but without the encoding declared? Success if strings are ASCII, encoding error if something like 'ë' is encountered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file is ascii it will succeed in both python 2 and python 3. Latin1 is a superset of ascii.

If a file has a latin1 encoded character and does not have an encoding pragma it will be a syntax error in both python2 or python3 (without even factoring in babel)

If the file has a glyph that is latin1 representable (such as the one presented) that is encoded as utf8 and does not have an encoding pragma:

  • python2: syntax error
  • python3: this is fine (the default source encoding is utf8)

@jtwang
Copy link
Contributor

jtwang commented Apr 27, 2016

Changes look good to me. I have no idea why codecov croaked. @akx ?

@asottile
Copy link
Contributor Author

forgot to import pytest, maybe that's why it croaked /me rebases

@akx akx self-assigned this Apr 28, 2016
@akx
Copy link
Member

akx commented Apr 28, 2016

This is technically fine as a patch -- thanks @asottile -- but I'm wondering... Should we maybe default to UTF-8 on both Py2 and Py3?

I kind of dislike the idea that Babel acts differently depending on the platform it's being run on, especially considering all of the other compat stuff we have that makes it work equivalently on both!

I kinda do think it'd be better to assume UTF-8 on every .py file regardless of interpreter version, unless told otherwise. Is there something I'm missing with that approach?

@asottile
Copy link
Contributor Author

I thought about that as well too, the only issue I have with that is it would cause babel to potentially scan files which are syntactically incorrect -- not sure how much of an issue that is though. In most tools I've written I've defaulted to UTF-8 in both to simplify code so maybe it's OK here too?

@akx
Copy link
Member

akx commented Apr 28, 2016

I think it's okay. Really, as an user of Babel, if you have a source file that

  • has internationalizable strings
  • is using an esoteric encoding
  • doesn't declare the encoding in the standard-format header

then you have bigger problems already. 😁

@asottile asottile changed the title Properly default source encoding according to PEP8 Default to UTF-8 source encoding (PEP8: python3) Apr 28, 2016
@asottile
Copy link
Contributor Author

@akx now defaulting to UTF-8 everywhere :)

@codecov-io
Copy link

codecov-io commented Apr 28, 2016

Current coverage is 88.78%

Merging #399 into master will increase coverage by -1.36%

@@           master       #399   diff @@
========================================
  Files          24         24          
  Lines        3950       3950          
  Methods         0          0          
  Branches        0          0          
========================================
- Hits         3561       3507    -54   
- Misses        389        443    +54   
  Partials        0          0          
  1. 3 files (not in diff) in babel/localtime were modified. more
    • Misses +19
    • Hits -19
  2. 6 files (not in diff) in babel were modified. more
    • Misses +32
    • Hits -32
  3. File .../messages/extract.py was modified. more
    • Misses +2
    • Partials 0
    • Hits -2

Powered by Codecov. Last updated by 6c3d405

@akx
Copy link
Member

akx commented Apr 28, 2016

@asottile Cool beans! The test cases from da87edd and 51df457 nicely cover the other encoding cases, I think, so just waiting for CI before merging.

@akx akx merged commit 8b5e5b7 into python-babel:master Apr 28, 2016
@asottile asottile deleted the default_source_encoding branch April 28, 2016 17:16
@asottile asottile mentioned this pull request Jan 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants