-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate file encodings throw UnicodeDecodeError #102
Comments
Thanks for raising the issue. I have not worked with files with alternate encodings before, I will have a look and see if I can reproduce this and fix it tomorrow! |
Strange, I just tried to reproduce it but was not able to. I added the following file and ran deptry:
deptry succesfully parsed this file and concluded that So I think there is also a system-specific issue here? Maybe to avoid this error on all systems, we need to detect and explicitly specify specify the encoding while reading like shown [here](open('filename', encoding="ISO-8859-1")):
But then we would need to detect the file encoding first. Anyway, I do not have a lot of knowledge about encodings, so this might take me some time. Would also be good if I can find a way to reproduce this on my laptop. I will dive deeper into this issue tomorrow. |
Maybe you are using Windows, where ISO-8859-1 can be an assumed encoding? System:
|
I'm using macOS 12.3.1 and Python 3.9. I think the issue should now be solved in release 0.4.6. From this version, deptry tries to identify the file-encoding before reading it using |
I've updated to 0.4.6. New error is UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 1498: character maps to Here is the stack trace:
|
Sorry that the implemented solution did not solve your problem. It seems that chardet identifies an incorrect encoding for the file. I guess the only possible solution left is to catch this error and log a warning to the user that the specific file will be omitted while scanning for imports, since AFAIK there is no other way to identify the encoding. |
@sbywater Would it be possible for you to create a reproducible example? I currently fail to reproduce the error. I am currently thinking of implementing the following:
Which would look as follows.
But I fail to write a unit test without being able to reproduce the error first. |
I can clarify now: the original problem file no longer throws an error. However, under 0.4.6 a file that worked before now throws the UnicodeDecodeError. The problem file does not declare a file encoding, and includes this code:
Let me know if you'd like me to create a new issue for this. Your proposed patch looks like a good solution. Here is a verbose stack trace...
|
Weirdly enough, a file with the line I have decided to release Could you try with |
Added a PR with a unit test for the warning logging when a file has encoding-issues: #106 |
I believe this is fixed with the aforementioned PR |
I agree that this is now fixed. |
I'm getting this same unicode emoji issue on deptry 0.12.0, Windows 10, Python 3.11.0 ; the file with just |
Describe the bug
Python files that declare an alternate encoding throw a UnicodeDecodeError:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position n: invalid continuation byte
To Reproduce
Steps to reproduce the behavior:
# -*- coding: iso-8859-15 -*-
my_string = 'é'
Expected behavior
These files should be parsed correctly.
The text was updated successfully, but these errors were encountered: