Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cPickle error when saving similarity matrix #7

Closed
salmoni opened this issue Mar 1, 2011 · 2 comments
Closed

cPickle error when saving similarity matrix #7

salmoni opened this issue Mar 1, 2011 · 2 comments

Comments

@salmoni
Copy link

salmoni commented Mar 1, 2011

Note: applies only to similarities.Similarity and not similarities.MatrixSimilarity similarities.SparseMatrixSimilarity.

After being created, attempting to save similarities.Similarity matrix results in error when using cPickle.

Traceback (most recent call last):
File "/Users/alan/Projects/LSIA/docs/code/Similarities01.py", line 45, in
Q = querier(corpus)
File "/Users/alan/Projects/LSIA/docs/code/Similarities01.py", line 15, in init
self.index.save(self.workdir+'/ops/sims.index')
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/gensim-0.7.7-py2.6.egg/gensim/utils.py", line 120, in save
cPickle.dump(self, f, protocol=-1) # -1 to use the highest available protocol, for efficiency
PicklingError: Can't pickle <class 'gensim.interfaces.TransformedCorpus'>: attribute lookup gensim.interfaces.TransformedCorpus failed

Gensim 0.7.7 on OSX.

@piskvorky
Copy link
Owner

Hmm, not sure if that's a bug or a feature. The Similarity class "indexes" the corpus you give to it by simply keeping a reference, it doesn't realize all the vectors in memory, unlike MatrixSimilarity and SparseMatrixSimilarity. So if the corpus you pass to it doesn't support pickling, neither does Similarity.

Anyway I guess in your case the problem is more simple. You're storing a transformed corpus, like lsi[old_corpus], which is picklable. I changed the code to support this directly, the new version is in the develop branch. Your code should work now. The commit is 950f53a , it's just 10 lines of code.

I'll have to sit down and think about the design of the more general case though, maybe I should change the interface so that all corpora must support un/pickling?

@piskvorky
Copy link
Owner

I'm closing this issue; Alan, let me know if you encounter problems applying the changes or if something else comes up.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants