New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

word2vec 模型有1个G，无论设置多少内存给模型，都感觉不是一般的慢，无法使用。 #1304

Closed

yangnianen opened this issue Oct 17, 2019 · 1 comment

Labels

yangnianen commented Oct 17, 2019

我的word2vec 模型有1个G，发现运行docVectorModel.nearest 方法是在太慢，我应该如何提高模型训练好的，预测速度？现在大于 20秒以上，真痛心，压根没法用呢

hankcs added a commit that referenced this issue


          WordVectorModel支持自定义Map类型：#1304

3c214ec

Owner

hankcs commented Oct 20, 2019

效率瓶颈有两个方面：

单词查找。你可以传入一个你认为比较快的Map：

HanLP/src/main/java/com/hankcs/hanlp/mining/word2vec/WordVectorModel.java

Line 39 in 3c214ec

* @param storage 一个空白的Map（HashMap等）
向量点积。你可以尝试重载

HanLP/src/main/java/com/hankcs/hanlp/mining/word2vec/AbstractVectorModel.java

Line 96 in 7c7e202

private List<Map.Entry<K, Float>> nearest(K key, Vector vector, int size)

用多线程实现点积。

hankcs closed this as completed

hankcs added the improvement label

hankcs added a commit that referenced this issue


          WordVectorModel支持自定义Map类型：#1304

77632b3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment