You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gensim 0.12.1 doc2vec.
When using clip_start, clip_end in most_similar function, the result gives keys from the start of dataset and not from clip_start.
In order to reproduce use:
clip_start=5, clip_end=10, topn=5.
The results should be items with keys: 5, 6, 7, 8,9.
But it gives items with keys: 0, 1, 2, 3, 4.
Look at this part of code in doc2vec.py most_similar function:
dists = dot(self.doctag_syn0norm[clip_start:clip_end], mean)
if not topn:
return dists
best = matutils.argsort(dists, topn=topn + len(all_docs), reverse=True)
# ignore (don't return) docs from the input
result = [(self._key_index(sim), float(dists[sim])) for sim in best if sim not in all_docs]
return result[:topn]
The issue is that self._key_index(sim) takes sim, whereas sim is index of best. best doesn't take into consideration clip_start.
Changing this line solves the issue. result = [(self._key_index(sim + clip_start), float(dists[sim])) for sim in best if sim not in all_docs]
Please review
The text was updated successfully, but these errors were encountered:
gensim 0.12.1 doc2vec.
When using clip_start, clip_end in most_similar function, the result gives keys from the start of dataset and not from clip_start.
In order to reproduce use:
clip_start=5, clip_end=10, topn=5.
The results should be items with keys: 5, 6, 7, 8,9.
But it gives items with keys: 0, 1, 2, 3, 4.
Look at this part of code in doc2vec.py most_similar function:
The issue is that self._key_index(sim) takes sim, whereas sim is index of best. best doesn't take into consideration clip_start.
Changing this line solves the issue.
result = [(self._key_index(sim + clip_start), float(dists[sim])) for sim in best if sim not in all_docs]
Please review
The text was updated successfully, but these errors were encountered: