-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update 'documents_columns' param passed to 'Sparse2Corpus' #1704
Conversation
I'm glad to see you @chinmayapancholi13, have you returned to us 😃 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but please, add tests based on issue
gensim/sklearn_api/ldamodel.py
Outdated
@@ -57,7 +57,7 @@ def fit(self, X, y=None): | |||
Calls gensim.models.LdaModel | |||
""" | |||
if sparse.issparse(X): | |||
corpus = matutils.Sparse2Corpus(X) | |||
corpus = matutils.Sparse2Corpus(X, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Named parameter better, for clarity. (here and elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done now.
@@ -177,7 +177,7 @@ def testCSRMatrixConversion(self): | |||
newmodel.fit(sarr) | |||
bow = [(0, 1), (1, 2), (2, 0)] | |||
transformed_vec = newmodel.transform(bow) | |||
expected_vec = numpy.array([0.35367903, 0.64632097]) | |||
expected_vec = numpy.array([0.12843782, 0.87156218]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because now this script interprets the input differently -> different model -> different result, but looks suspicious, I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piskvorky As Ivan correctly pointed out, since the input matrix used for training (by fit
method) is being interpreted differently after making this change, the model changes and thus, the output of transform
changes as well.
@menshikh-iv Right now, among the unit-tests, we have a function And should we add separate unit-tests to ensure that input matrix is being inpterpreted in the right manner or just having a function like the current version of Also, I'm sorry for the delayed response. I am still a little busy in some university projects so it might take a few more weeks to resume the work here. 😄 |
I dont think that additional tests is needed here (all changes exactly similar), LGTM for me, @piskvorky wdyt? |
Up to you. Just checking: no default gensim behaviour was changed here, right? Only the wrapper. |
@piskvorky you are correct (only wrapper). |
(piskvorky#1704) * updated param passed to 'Sparse2Corpus' * updated unit test * updated 'partial_fit' and passed named params
Fixes issue #1676.
cc: @mmunozm @menshikh-iv