We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请确认下列注意事项:
当前最新版本号是:hanlp-1.7.5 我使用的版本是:hanlp-1.7.5
直接git clone到本地的pyhanlp,运行Ch03的ngram_segment.py,返回的1-gram频次为2,2-gram频次为0,Java版的输出是正确的,返回【商品】的词频:2,【商品@和】的频次:1
未对代码做修改
print(CoreDictionary.getTermFrequency("商品")) print(CoreBiGramTableDictionary.getBiFrequency("商品", "和"))
期望输出1-gram频次为2,2-gram频次为1
实际输出1-gram频次为2,2-gram频次为0
The text was updated successfully, but these errors were encountered:
猜了一下,可能是编码的问题,看了一下,win下运行训练的时候输出的“my_cws_model.ngram.txt”的编码是GB2312的,把文件改成utf-8的编码,同时删除my_cws_model.ngram.txt.table.bin后,再运行代码得到正确结果了。
Sorry, something went wrong.
我看在DictionaryMaker.java中保存模型时加了UTF-8参数 BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(IOUtil.newOutputStream(path), "UTF-8")); 但是在NGramDictionaryMaker.java中保存模型时都没有加UTF-8参数 BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(IOUtil.newOutputStream(path))); 希望作者可以加一个
NGramDictionaryMaker等默认UTF-8编码 fix #1320
511b978
感谢反馈,如果之后还有类似问题,欢迎继续提出。
6b31f02
No branches or pull requests
注意事项
请确认下列注意事项:
版本号
当前最新版本号是:hanlp-1.7.5
我使用的版本是:hanlp-1.7.5
我的问题
直接git clone到本地的pyhanlp,运行Ch03的ngram_segment.py,返回的1-gram频次为2,2-gram频次为0,Java版的输出是正确的,返回【商品】的词频:2,【商品@和】的频次:1
复现问题
未对代码做修改
步骤
触发代码
期望输出
实际输出
其他信息
The text was updated successfully, but these errors were encountered: