-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError: sparse documents must not contain any explicit zero entries and the similarity matrix S must satisfy x^T * S * x > 0 for any nonzero bag-of-words vector x. #2105
Comments
@Witiko can you have a look? |
Hey @DennisCologne, sorry to say I am the author of the code that gives you trouble. What Gensim and Python versions are you using? I can run the above code without issue with the PyPI version of Gensim (3.4.0), and Python 3.5 just fine. >>> sims
[(6, 0.8305764039419705),
(7, 0.7257781024707816),
(5, 0.5584027708699971),
(0, 0.43455470767273646),
(8, 0.4082457402348116),
(1, 0.3028528215099456),
(3, 0.09251811314306692),
(4, 0.07636744554253587),
(2, 0.04509321490371689)] |
Hi @Witiko, thank you for your answer. Actually, it is Python 2.7.14 with Gensim 3.4.0... after further investigation, the matrix-vector multiplication returns a negative value even though all of the values in both are positive. But you are right, I just tried it on my Python 3.6 environment and there it works fine. Thanks again for the quick reply. Best, |
Hey @DennisCologne, this is definitely interesting, but I can't seem to reproduce your problem even with Python 2.7 and Gensim 3.4.0. Can you find a pair of document vectors |
ping @DennisCologne, please provide information for reproducing an error (that requested in #2105 (comment)) |
ping @DennisCologne |
Similar issue with SoftCosineSimilarity. |
ping @Witiko |
I fail to see how this is related to the current issue, which should have been long closed due to the original poster's inactivity and the migration of the related code in Gensim 3.7. |
Assertion Error + SoftCosineSimilarity = Not related? |
The assertion error in this issue is supposed to come from the code in the pre-3.7 |
Traceback (most recent call last): What is wrong with this code that SoftCosineSimilarity doesn't like it? I tried to follow tutorial... |
For some reason, your word embeddings do not have the |
I am using gensim Word2Vec to generate w2v_model. |
Your issue above can be resolved by calling |
I cannot reproduce your other issue, i.e. >>> from gensim.corpora import Dictionary
>>> from gensim.models.word2vec import Word2Vec
>>> from gensim.test.utils import common_texts
>>>
>>> model = Word2Vec(common_texts, size=20, min_count=1)
>>> dictionary = Dictionary(common_texts)
>>> model.wv.similarity_matrix(dictionary)
<12x12 sparse matrix of type '<type 'numpy.float32'>'
with 68 stored elements in Compressed Sparse Column format> Can you run the above code without issue? |
Yes, I can. |
Now, few steps forward, for: |
Please, try the following: >>> from gensim.similarities import SparseTermSimilarityMatrix
>>>
>>> similarity_matrix = SparseTermSimilarityMatrix(termsim_index, dictionary) |
similarity_matrix = SparseTermSimilarityMatrix(termsim_index, dictionary) # construct similarity matrix |
Maybe the problem is creating by terms like 'chemical_element' or 'cabinet_minister' with underlines? |
I cannot reproduce your issue with new embeddings: >>> from gensim.corpora import Dictionary
>>> from gensim.models.keyedvectors import WordEmbeddingSimilarityIndex
>>> from gensim.models.word2vec import Word2Vec
>>> from gensim.similarities import SparseTermSimilarityMatrix
>>> from gensim.test.utils import common_texts
>>>
>>> model = Word2Vec(common_texts, size=20, min_count=1)
>>> model.wv.most_similar(positive=['computer'], topn=2)
[('response', 0.38100379705429077), ('minors', 0.3752439618110657)]
>>>
>>> termsim_index = WordEmbeddingSimilarityIndex(model.wv)
>>> dictionary = Dictionary(common_texts)
>>> similarity_matrix = SparseTermSimilarityMatrix(termsim_index, dictionary)
>>> similarity_matrix
<gensim.similarities.termsim.SparseTermSimilarityMatrix object at 0x7f822abc3d10> Judging by the error message, |
For common_texts, output is:
|
Can you please try with the embeddings that throw the |
For my text it stops even before:
|
As you can see on line 1400 in the error message above, Therefore, can you please print the result of |
Thank you for your patience: :)
|
This seems pretty iterable to me. |
Does my text make an error at your computer? |
Let's try to closely imitate the call on line 1400. Can you please print the result of the following: >>> termsim_index.kwargs
>>> termsim_index.keyedvectors
>>> most_similar = termsim_index.keyedvectors.most_similar(positive=['chemical_element'], topn=100)
>>> most_similar
>>> type(most_similar)
>>> '__iter__' in most_similar |
|
|
I can reproduce this with your text and I am investigating. |
The issue is that the >>> from gensim.models.word2vec import Word2Vec
>>> from gensim.test.utils import common_texts
>>>
>>> model = Word2Vec(common_texts, size=20, min_count=1)
>>> model.wv.most_similar(positive=['computer'], topn=2)
[('response', 0.38100379705429077), ('minors', 0.3752439618110657)]
>>> model.wv.most_similar(positive=['computer'], topn=0)
array([-0.1180886 , 0.32174808, -0.02938104, -0.21145007, 0.37524396,
-0.23777878, 0.99999994, -0.01436211, 0.36708638, -0.09770551,
0.05963777, 0.3810038 ], dtype=float32) This is an undocumented behavior, which can be fixed by removing lines 554 and 555 in >>> model.wv.most_similar(positive=['computer'], topn=0)
[] |
The patches are now available in #2356. Thank you for your patience in helping discover the bug and sorry for the trouble. 😉 |
I have following code:-
And it gives me following error:-
What can be probable reasons for it and how to resolve it? |
It seems as though your |
Hello there,
Maybe you can help me out with this real quick. I cannot run any of your examples. Not the one from https://radimrehurek.com/gensim/similarities/docsim.html, nor the one from this repo. All of them give me the following Assertion.
This is not working (other similaritiy measures of this module work fine):
Neither is this from the repo (I followed all previous steps):
Thanks in advance. I am trying to run this for two days now but nothing works.
Best,
Dennis
The text was updated successfully, but these errors were encountered: