Default to UTF-8 source encoding (PEP8: python3) #399

asottile · 2016-04-27T19:29:30Z

The following python3 source file:

print(_('☃'))

has UTF-8 encoding (by default without a pragma). The code I touched (before my change) caused this to be scanned as latin-1 (mojibake)

The section of PEP8 which covers this: http://legacy.python.org/dev/peps/pep-0008/#source-file-encoding

jtwang · 2016-04-27T19:42:51Z

babel/messages/extract.py

+    # In Python2, the default source encoding is US-ASCII
+    # In Python3, the default source encoding is UTF-8
+    # http://legacy.python.org/dev/peps/pep-0008/#source-file-encoding
+    default_encoding = 'US-ASCII' if PY2 else 'UTF-8'


What will now happen when scanning a Python file that was encoded using latin-1 but without the encoding declared? Success if strings are ASCII, encoding error if something like 'ë' is encountered?

If the file is ascii it will succeed in both python 2 and python 3. Latin1 is a superset of ascii.

If a file has a latin1 encoded character and does not have an encoding pragma it will be a syntax error in both python2 or python3 (without even factoring in babel)

If the file has a glyph that is latin1 representable (such as the one presented) that is encoded as utf8 and does not have an encoding pragma:

python2: syntax error

python3: this is fine (the default source encoding is utf8)

jtwang · 2016-04-27T19:54:06Z

Changes look good to me. I have no idea why codecov croaked. @akx ?

asottile · 2016-04-27T20:22:33Z

forgot to import pytest, maybe that's why it croaked /me rebases

akx · 2016-04-28T16:02:12Z

This is technically fine as a patch -- thanks @asottile -- but I'm wondering... Should we maybe default to UTF-8 on both Py2 and Py3?

I kind of dislike the idea that Babel acts differently depending on the platform it's being run on, especially considering all of the other compat stuff we have that makes it work equivalently on both!

I kinda do think it'd be better to assume UTF-8 on every .py file regardless of interpreter version, unless told otherwise. Is there something I'm missing with that approach?

asottile · 2016-04-28T16:35:07Z

I thought about that as well too, the only issue I have with that is it would cause babel to potentially scan files which are syntactically incorrect -- not sure how much of an issue that is though. In most tools I've written I've defaulted to UTF-8 in both to simplify code so maybe it's OK here too?

akx · 2016-04-28T16:37:32Z

I think it's okay. Really, as an user of Babel, if you have a source file that

has internationalizable strings
is using an esoteric encoding
doesn't declare the encoding in the standard-format header

then you have bigger problems already. 😁

asottile · 2016-04-28T16:42:28Z

@akx now defaulting to UTF-8 everywhere :)

codecov-io · 2016-04-28T16:52:24Z

Current coverage is 88.78%

Merging #399 into master will increase coverage by -1.36%

@@           master       #399   diff @@
========================================
  Files          24         24          
  Lines        3950       3950          
  Methods         0          0          
  Branches        0          0          
========================================
- Hits         3561       3507    -54   
- Misses        389        443    +54   
  Partials        0          0

3 files (not in diff) in babel/localtime were modified. more
- Misses +19
- Hits -19
6 files (not in diff) in babel were modified. more
- Misses +32
- Hits -32
File .../messages/extract.py was modified. more
- Misses +2
- Partials 0
- Hits -2

Powered by Codecov. Last updated by 6c3d405

akx · 2016-04-28T16:54:34Z

@asottile Cool beans! The test cases from da87edd and 51df457 nicely cover the other encoding cases, I think, so just waiting for CI before merging.

gitmate-bot added process/pending review size/S pending review labels Apr 27, 2016

jtwang reviewed Apr 27, 2016
View reviewed changes

akx self-assigned this Apr 28, 2016

asottile changed the title ~~Properly default source encoding according to PEP8~~ Default to UTF-8 source encoding (PEP8: python3) Apr 28, 2016

Default to UTF-8 source encoding (PEP8: python3)

33a8fc4

akx merged commit 8b5e5b7 into python-babel:master Apr 28, 2016

asottile deleted the default_source_encoding branch April 28, 2016 17:16

asottile mentioned this pull request Jan 11, 2017

Make a new release? #468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to UTF-8 source encoding (PEP8: python3) #399

Default to UTF-8 source encoding (PEP8: python3) #399

asottile commented Apr 27, 2016

jtwang Apr 27, 2016

asottile Apr 27, 2016

jtwang commented Apr 27, 2016

asottile commented Apr 27, 2016

akx commented Apr 28, 2016

asottile commented Apr 28, 2016

akx commented Apr 28, 2016

asottile commented Apr 28, 2016

codecov-io commented Apr 28, 2016 •

edited

Loading

akx commented Apr 28, 2016

Default to UTF-8 source encoding (PEP8: python3) #399

Default to UTF-8 source encoding (PEP8: python3) #399

Conversation

asottile commented Apr 27, 2016

jtwang Apr 27, 2016

Choose a reason for hiding this comment

asottile Apr 27, 2016

Choose a reason for hiding this comment

jtwang commented Apr 27, 2016

asottile commented Apr 27, 2016

akx commented Apr 28, 2016

asottile commented Apr 28, 2016

akx commented Apr 28, 2016

asottile commented Apr 28, 2016

codecov-io commented Apr 28, 2016 • edited Loading

Current coverage is 88.78%

akx commented Apr 28, 2016

codecov-io commented Apr 28, 2016 •

edited

Loading