-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError: 'gbk' codec can't decode byte 0x91 in position 2: illegal multibyte sequence #5
Comments
我没出现过这样的问题,你可以看看数据集的编码。 |
数据格式我列在readme中了 |
好的,我看一下 |
我的数据是从gold-horse项目里面拿出来的,我没改过文件的编码呀 |
我给的链接中weiboNER_2nd_conll数据需要处理,处理成README中的格式。 |
一 O |
我拿到的数据就是你readme 的格式,大概是文件解压的时候编码出问题了。 |
你可以留一下邮箱,我把可以公开的weibo数据发给你。 |
好的,谢谢你,我的邮箱号是[email protected] |
确实是数据集的编码问题。 |
本人猜测,你是在window上跑的吧,作者的程序应该是在Linux上跑的。数据集的编码方式一般是UTF-8,window的默认编码方式是GBK,Linux默认编码方式是UTF-8。在window上面使用,需要读的时候写上编码方式。如 |
@zhangdddong 正解 |
万能解法: |
File "D:\pycode\Graph4CNER\utils\functions.py", line 16, in read_instance
in_lines = open(input_file, 'r').readlines()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x91 in position 2: illegal multibyte sequence
将‘r’替换成‘rb’,出现 AttributeError: 'int' object has no attribute 'isdigit'
The text was updated successfully, but these errors were encountered: