Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

自定义词性的用户词典导入 #626

Closed
AllyW opened this issue Sep 14, 2017 · 2 comments
Closed

自定义词性的用户词典导入 #626

AllyW opened this issue Sep 14, 2017 · 2 comments
Labels

Comments

@AllyW
Copy link

AllyW commented Sep 14, 2017

版本号

当前最新版本号是:1.3.4
我使用的版本是:portable-1.3.4

我的问题

根据issue243

那可否在加载用户自定义词典文件前,先用CustomNatureUtility.addNature(“新词性”);然后再用ACDoubleArrayTrieSegment.loadDistionary来导入带有自定义词性的用户自定义词典文件?
我做了如下操作:

CustomNatureUtility.addNature(“新词性1”);
CustomNatureUtility.addNature("新词性2");
AhoCorasickDoubleArrayTrieSegment segment = new AhoCorasickDoubleArrayTrieSegment()
                .loadDictionary(HanLP.Config.CustomDictionaryPath[1]);

发现报错:
java.lang.IllegalArgumentException: No enum constant com.hankcs.hanlp.corpus.tag.Nature.新词性2

跟踪了一下代码发现上述代码添加词性之后,词性常量里面,只添加成功了新词性1,没有新词性2.

如果只添加一个新词性1并且词典文件中只有新词性1的话,就可以正确对分词结果标注用户定义的新词性1,添加两个新词性,就不可以,不知是否有人遇到类似问题?谢谢。

@AllyW
Copy link
Author

AllyW commented Sep 14, 2017

如果把“新词性1”,“新词性2” 直接追加在src/main/java/com/hankcs/hanlp/corpus/tag/Nature.java 的枚举Nature 的begin后面,就可以用AhoCorasickDoubleArrayTrieSegment load带有新词性1和新词性2的用户自定义词典文件,现在看来,是Nature那边添加新词性2不成功的问题。

@hankcs
Copy link
Owner

hankcs commented Sep 17, 2017

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

@hankcs hankcs closed this as completed Sep 17, 2017
@hankcs hankcs added the bug label Sep 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants