-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw exception if load() is called on instance rather than the class #889
Conversation
I believe a more compact fix, with less meta-method-juggling in multiple places, would be possible and desirable here. Can the load/load_word2vec_format methods just detect is their first parameter ( |
@gojomo I'd have liked that too, but it appears that the method does not differentiate between class or self once the call is made. |
When it's called on an instance, isn't the first parameter ( |
|
||
def load(self, *args, **kwargs): | ||
logger.warn('Load was called on instance. Calling on class instead') | ||
Word2Vec.load(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not expected behaviour.
Expected behaviour is to print warning but still run the instance method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest that best would be to error – this method doesn't modify an instance in-place, so the "one and only one right way to do it" is to call it is via the class.
@gojomo when calling on instance,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move methods inside the instance. Link to code showing the first parameter.
@@ -452,6 +453,9 @@ def __init__( | |||
self.total_train_time = 0 | |||
self.sorted_vocab = sorted_vocab | |||
self.batch_words = batch_words | |||
|
|||
self.load = methodize(load, self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean self.load = methodize(self.load, self)
?
it is very confusing to assign self.load
and then define def load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done on purpose. Please see here for reference.
MethodType
binds the method to the calling instance. What I'm doing here is something like overloading load
based on whether it is called on instance or class.
So the argument load
to methodize
is the name of the function that is to be bound.
@@ -1869,3 +1873,15 @@ def __iter__(self): | |||
model.accuracy(args.accuracy) | |||
|
|||
logger.info("finished running %s", program) | |||
|
|||
def methodize(func, instance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are defined outside of the class. Maybe that is why you see no difference in calling these methods and class methods? Can you link to a gist of the code you use to test the first parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
methods are defined outside of the class
This was done on purpose. Please see here for reference.
MethodType
binds the method to the calling instance. What I'm doing here is something like overloading load
based on whether it is called on instance or class.
So the argument load
to methodize
is the name of the function that is to be bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My latest build is failing on Python 2 for some reason, I'm working on that right now. Once that's done, I'll write the tests and merge it back with develop.
Aha, sorry, I thought an This sort of warning would be a nice convenience for explaining a common user error that's been asked-about up a bunch of times (and probably many more times where the user eventually figured it out for themselves)... but it wouldn't justify much extra code or meta-method complexity. How about if instead, there's a generic utility function (taking any number of positional/keyword params) that just throws an error to the effect of, "this method name should only be called on the class object" – and that function is installed on instances in the relevant method names? That could be a smaller and clearer way to catch the error. |
@gojomo you're right, this is a very inefficient way of doing things. What if I use |
This approach seems a bit more focused. But my concern is still complicating the code with advanced meta-features for such a minor convenience API warning. (Separate observations: as |
@gojomo any solution we find for this will require complicating of the code to some extent. We've exhausted a lot of possible options. Clearly, though, this approach has much fewer meta-features as compared to others.
This is a valid concer. Do you have any recommendations?
It's the latter – I was editing in a different branch which was up to date with master. I'd forgotten about that. |
The suggestion in my 20161002 comment might be minimal in terms of line-count and magic. Roughly (not tested – I think this would work):
(in Word2Vec/etc)
Though perhaps also, if iterating over the
(Essentially, this would be a slightly-slimmer refinement of your first approach.) |
modified: gensim/test/test_word2vec.py
modified: gensim/test/test_word2vec.py
modified: gensim/test/test_word2vec.py
Changes look good to me, as soon as tests are sensible and pass. Thanks for patiently sticking with this! |
modified: gensim/test/test_word2vec.py
modified: gensim/test/test_word2vec.py
@@ -217,6 +217,10 @@ def any2unicode(text, encoding='utf8', errors='strict'): | |||
return unicode(text, encoding, errors=errors) | |||
to_unicode = any2unicode | |||
|
|||
def call_on_class_only(*args, **kwargs): | |||
"""Warns when load methods are called on instance""" | |||
logger.warn('This method should be called on a class object.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to be failing quietly (or, if logging not configured, in complete silence), ignoring the user's explicit request.
Exception is more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I should have put an exception in my example, so that a coder making this error knows to change their calling code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dust0x please fix.
@@ -432,6 +432,8 @@ def __init__( | |||
|
|||
""" | |||
|
|||
self.load = call_on_class_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this will leave the attribute function unbound, leading to potential confusion / serialization issues.
Proper way would be to bind it properly with types.MethodType
or somesuch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the serialization tests pass, and the exception message is understandable if it's ever called (by mistake), I don't see what other clarity would come from binding the method. That'd just make it look more like an instance-specific operation... which it specifically is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piskvorky, that is what I did at the beginning. Apart from the points @gojomo raises, there are some issues with the way types.MethodTypes
is handled in Python versions 2 and 3.
self.assertTrue(warning in str(l)) | ||
|
||
@log_capture() | ||
def testLoadWord2VecOnClassWarning(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this second test necessary? The first test above is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I'll remove that.
load_on_instance = doc2vec.Doc2Vec() | ||
model = load_on_instance.load(testfile()) | ||
warning = "Load methods should be called on a class object." | ||
self.assertTrue(warning in str(l)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this l
, where does it come from?
Not a good name for a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it seems to be the 'usual' name for the LogCapture objects provided by the testfixtures package @log_capture
decorator. (See for example: https://testfixtures.readthedocs.io/en/latest/logging.html#the-decorator) But, that would seem to require an l
parameter be added to the method, which I don't see here... so a bit baffled that this could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No of course it wouldn't work. Sorry, I was trying something you weren't supposed to see this. I should've done this in a separate branch.
binary_model = word2vec.Word2Vec() | ||
load_on_instance = binary_model.load_word2vec_format(testfile(), binary=True) | ||
warning = "Load methods should be called on a class object." | ||
self.assertTrue(warning in str(l)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dtto -- what is l
?
@@ -217,6 +217,10 @@ def any2unicode(text, encoding='utf8', errors='strict'): | |||
return unicode(text, encoding, errors=errors) | |||
to_unicode = any2unicode | |||
|
|||
def call_on_class_only(*args, **kwargs): | |||
"""Warns when load methods are called on instance""" | |||
logger.warn('This method should be called on a class object.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dust0x please fix.
load
is called on instance rather than the class
load
is called on instance rather than the class
Sorry for the delay. Request to review. |
Addresses one out of four issues in #692