We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请确认下列注意事项:
当前最新版本号是:1.7.5 我使用的版本是:1.7.5
对某篇文章进行关键短语提取时,发现短语的score都是NaN,跟进发现是词语的左熵或右熵都是0导致的
从 MutualInformationEntropyPhraseExtractor.extractPhrase(text, size) -> occurrence.compute()
package com.hankcs.hanlp.corpus.occurrence; public class Occurrence { ... /** * 输入数据完毕,执行计算 */ public void compute() { entrySetPair = triePair.entrySet(); double total_mi = 0; double total_le = 0; double total_re = 0; for (Map.Entry<String, PairFrequency> entry : entrySetPair) { PairFrequency value = entry.getValue(); value.mi = computeMutualInformation(value); value.le = computeLeftEntropy(value); value.re = computeRightEntropy(value); total_mi += value.mi; total_le += value.le; total_re += value.re; } for (Map.Entry<String, PairFrequency> entry : entrySetPair) { PairFrequency value = entry.getValue(); // 问题出在下面这句,当total_le或total_re为0时,score为NaN // 因对左右信息熵不太了解,不确定下面的处理方式是否可行: // 给分母加一个足够小的数,例如:value.score = value.mi / total_mi + value.le / (total_le+0.0001)+ value.re / (total_re+0.0001); value.score = value.mi / total_mi + value.le / total_le+ value.re / total_re; // 归一化 value.score *= entrySetPair.size(); } } }
The text was updated successfully, but these errors were encountered:
7dd79cc
感谢反馈,已经修复,请参考上面的commit。 如果还有问题,欢迎重开issue。
Sorry, something went wrong.
修复信息熵计算中的除零错误 fix #1366
ad99040
No branches or pull requests
注意事项
请确认下列注意事项:
版本号
当前最新版本号是:1.7.5
我使用的版本是:1.7.5
我的问题
对某篇文章进行关键短语提取时,发现短语的score都是NaN,跟进发现是词语的左熵或右熵都是0导致的
触发代码
从 MutualInformationEntropyPhraseExtractor.extractPhrase(text, size) -> occurrence.compute()
The text was updated successfully, but these errors were encountered: