Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle invalid UTF8 start bytes #69

Open
jrgfogh opened this issue Apr 25, 2024 · 1 comment
Open

Handle invalid UTF8 start bytes #69

jrgfogh opened this issue Apr 25, 2024 · 1 comment

Comments

@jrgfogh
Copy link

jrgfogh commented Apr 25, 2024

My build fails because my source code contains invalid unicode start bytes.
I don't know how the files got corrupted, but it seems like the kind of thing you would want a lint tool to fix, since my editors have no trouble reading the files.

Here is an example error message:

Processing 5 files: ./sw/lazy_init.h, ./sw/propagate_const.h, ./tests/lazy_init_tests.cpp, ./tests/propagate_const_tests.cpp, ./tests/gtest_unwarn.h
run-clang-format.py: error: ./tests/propagate_const_tests.cpp: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 69: invalid start byte
Traceback (most recent call last):
File "/run-clang-format.py", line 122, in run_clang_format_diff_wrapper
ret = run_clang_format_diff(args, file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run-clang-format.py", line 188, in run_clang_format_diff
errs = list(proc_stderr.readlines())
^^^^^^^^^^^^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 69: invalid start byte

The failing code can be found here:
https://github.com/jrgfogh/small_wrappers/tree/909a477e92cf955ba24bd2b062f45e32c67f9644

@SteffenL
Copy link
Contributor

I don't know what's going on with your files, but I cloned your repository, checked out commit 909a477e92cf955ba24bd2b062f45e32c67f9644 and inspected each header/source file with a hex editor, and I couldn't find any 0xff bytes in ./tests/propagate_const_tests.cpp or any of the other header/source files. Without inspecting them any further, I would say the files are most likely valid and that's why your editors have no troubles with them.

While invalid UTF-8 is detectable, it isn't really something that can be corrected automatically, and you do want your source code to be interpreted correctly. A fatal error is therefore a desired behavior.

Unfortunately this issue isn't actionable from my point of view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants