Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'gbk' codec can't decode: #89

Open
ffreemt opened this issue Mar 27, 2021 · 4 comments
Open

UnicodeDecodeError: 'gbk' codec can't decode: #89

ffreemt opened this issue Mar 27, 2021 · 4 comments

Comments

@ffreemt
Copy link

ffreemt commented Mar 27, 2021

...
   change_request
  File "c:\path-to-project-folder\.venv\lib\site-packages\tbump\file_bumper.py", line 219, in compute_patches_for_change_request
    old_lines = file_path.read_text().splitlines(keepends=False)
File "C:\Python\Python37\lib\pathlib.py", line 1222, in read_text
    return f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 118: illegal multibyte sequence

A simple patch in file_bumper.py", line 219 fixes the problem:
file_path.read_text() -> file_path.read_text("utf8")

It would be nice if the next version adds this "utf8".

@dmerejkowsky
Copy link
Collaborator

Hum. I need to think about this one.

I think tbump uses .encode() and .decode() without special arguments everywhere. In theory, tbump should use the default encoding of the platform it is running on and work out of the box everywhere, regardless of how your the source files are encoded.

That being said, maybe I'm wrong. In that case, we should use encode() and decode() with the utf-8 encoding explicitly set everywhere and document that tbump only works for UTF-8 encodings.

At any rate, I'm pretty sure that just patching this one line is not enough.

What do you think ?

@cgestes
Copy link

cgestes commented Mar 30, 2021 via email

@yucongo
Copy link

yucongo commented May 21, 2021

Hum. I need to think about this one.

I think tbump uses .encode() and .decode() without special arguments everywhere. In theory, tbump should use the default encoding of the platform it is running on and work out of the box everywhere, regardless of how your the source files are encoded.

That being said, maybe I'm wrong. In that case, we should use encode() and decode() with the utf-8 encoding explicitly set everywhere and document that tbump only works for UTF-8 encodings.

At any rate, I'm pretty sure that just patching this one line is not enough.

What do you think ?

Not too sure, but I patched that line and everything is fine it seems.

@dmerejkowsky
Copy link
Collaborator

Not too sure, but I patched that line and everything is fine it seems.

Yeah but you should not have to!

It would be good to reproduce and figure out the root cause but I don't have access to a Windows machine that uses the gbdk encoding ...

I don't want to merge a patch that hard-codes utf-8 without understanding all the implications.

Let's ask for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants