Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deprecated parameters in D2VTransformer and W2VTransformer. Fix #1937 #1945

Merged
merged 45 commits into from
Mar 12, 2018

Conversation

morsdor
Copy link
Contributor

@morsdor morsdor commented Mar 1, 2018

I tried modifying a liitle bit of @ swierh code and added a bit what he recommended.

@menshikh-iv
Copy link
Contributor

@MritunjayMohitesh next time please push all code to one PR (no need to create one more for fixes).

@morsdor
Copy link
Contributor Author

morsdor commented Mar 1, 2018

Sorry about that.
I will do from the next time. It was actually my first time.

@morsdor morsdor closed this Mar 1, 2018
@menshikh-iv menshikh-iv reopened this Mar 1, 2018
@menshikh-iv
Copy link
Contributor

@MritunjayMohitesh I already closed previous PR, let's continue here

@@ -404,7 +400,7 @@ def __init__(self, documents=None, dm_mean=None, dm=1, dbow_words=0, dm_concat=0
self.train(
documents, total_examples=self.corpus_count, epochs=self.epochs,
start_alpha=self.alpha, end_alpha=self.min_alpha, callbacks=callbacks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not relevant to PR, please remove it from PR

"""

def __init__(self, sentences=None, size=100, alpha=0.025, window=5, min_count=5,
def __init__(self, sentences=None, size=None, vector_size=100, alpha=0.025, window=5, min_count=5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes non-relevant too, please remove it.

trim_rule=None, sorted_vocab=1, batch_words=10000):
"""
Sklearn wrapper for Word2Vec model. See gensim.models.Word2Vec for parameter details.
"""
if iter is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense here to rename the arguments here? You can simply pass correct variants to "original" class and this is enough to avoid deprecation problems.

CC: @manneshiva

@menshikh-iv
Copy link
Contributor

ping @MritunjayMohitesh

@morsdor
Copy link
Contributor Author

morsdor commented Mar 11, 2018

I can't understand why travisci is failing. Some help would be appreciated.

@@ -239,9 +239,9 @@ class TaggedDocument(namedtuple('TaggedDocument', 'words tags')):
and `tags` (a list of tokens). Tags may be one or more unicode string
tokens, but typical practice (which will also be most memory-efficient) is
for the tags list to include a unique integer id as the only tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change (leading space in empty line), please revert all changes for doc2vec.py and Travis will pass successfully

self.null_word = null_word
self.trim_rule = trim_rule
self.sorted_vocab = sorted_vocab
self.batch_words = batch_words

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leading spaces again

def fit(self, X, y=None):
"""
Fit the model according to the given training data.
Calls gensim.models.Word2Vec
"""
self.gensim_model = models.Word2Vec(
sentences=X, size=self.size, alpha=self.alpha,
sentences=X, size=self.vector_size, alpha=self.alpha,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you should change size to vector_size and same for iter here for avoid DeprecationWarning.

def __init__(self, size=100, alpha=0.025, window=5, min_count=5, max_vocab_size=None, sample=1e-3, seed=1,
workers=3, min_alpha=0.0001, sg=0, hs=0, negative=5, cbow_mean=1, hashfxn=hash, iter=5, null_word=0,
trim_rule=None, sorted_vocab=1, batch_words=10000):
def __init__(self, size=None, vector_size=100, alpha=0.025, window=5, min_count=5, max_vocab_size=None, sample=1e-3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need any changes to w2vmodel.py (this deprecation only for d2v), this is also unrelated change, please revert this file.

docvecs_mapfile=None, comment=None, trim_rule=None, size=100, alpha=0.025, window=5, min_count=5,
max_vocab_size=None, sample=1e-3, seed=1, workers=3, min_alpha=0.0001, hs=0, negative=5, cbow_mean=1,
hashfxn=hash, iter=5, sorted_vocab=1, batch_words=10000):
docvecs_mapfile=None, comment=None, trim_rule=None, size=None, vector_size=100, alpha=0.025, window=5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add new arguments/attributes/raise more deprecation stuff, simply change arguments name in fit when you pass it to Doc2Vec, that's all fix here.

@@ -72,11 +79,11 @@ def fit(self, X, y=None):
documents=d2v_sentences, dm_mean=self.dm_mean, dm=self.dm,
dbow_words=self.dbow_words, dm_concat=self.dm_concat, dm_tag_count=self.dm_tag_count,
docvecs=self.docvecs, docvecs_mapfile=self.docvecs_mapfile, comment=self.comment,
trim_rule=self.trim_rule, size=self.size, alpha=self.alpha, window=self.window,
trim_rule=self.trim_rule, size=self.vector_size, alpha=self.alpha, window=self.window,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes of your PR should be in 2 lines of fit function, that's all that you needed here.

@menshikh-iv
Copy link
Contributor

@MritunjayMohitesh Horaay 🔥, looks like needed change, let's wait CI

@menshikh-iv menshikh-iv changed the title Fix deprecated parameters in D2VTransformer and W2VTransformer . Fix #1937 Fix deprecated parameters in D2VTransformer and W2VTransformer. Fix #1937 Mar 12, 2018
@morsdor
Copy link
Contributor Author

morsdor commented Mar 12, 2018 via email

@menshikh-iv
Copy link
Contributor

@MritunjayMohitesh this shouldn't be passed to w2v class (this have no this parameter). Test passed correctly (probably @swierh trying to run tests based on old version of your PR), right now all works fine.

@menshikh-iv menshikh-iv merged commit dc97c51 into piskvorky:develop Mar 12, 2018
@menshikh-iv
Copy link
Contributor

Thanks @MritunjayMohitesh, congratz with the first-time contribution 🥇 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants