Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw exception if load() is called on instance rather than the class #889

Merged
merged 42 commits into from
Dec 22, 2016
Merged

Throw exception if load() is called on instance rather than the class #889

merged 42 commits into from
Dec 22, 2016

Conversation

aayux
Copy link
Contributor

@aayux aayux commented Sep 26, 2016

Addresses one out of four issues in #692

@aayux aayux changed the title Raise warning if calling load() on an instance rather than the class [WIP] Raise warning if calling load() on an instance rather than the class Sep 26, 2016
@gojomo
Copy link
Collaborator

gojomo commented Sep 27, 2016

I believe a more compact fix, with less meta-method-juggling in multiple places, would be possible and desirable here. Can the load/load_word2vec_format methods just detect is their first parameter (cls) isn't the expected class-object?

@aayux
Copy link
Contributor Author

aayux commented Sep 27, 2016

@gojomo I'd have liked that too, but it appears that the method does not differentiate between class or self once the call is made.

@gojomo
Copy link
Collaborator

gojomo commented Sep 27, 2016

When it's called on an instance, isn't the first parameter (cls) an instance rather than the class-object?


def load(self, *args, **kwargs):
logger.warn('Load was called on instance. Calling on class instead')
Word2Vec.load(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not expected behaviour.
Expected behaviour is to print warning but still run the instance method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest that best would be to error – this method doesn't modify an instance in-place, so the "one and only one right way to do it" is to call it is via the class.

@aayux
Copy link
Contributor Author

aayux commented Sep 27, 2016

@gojomo when calling on instance, classmethod takes the calling instance's class as the hidden parameter.

cls is of type class-object: as a call to type (cls) will show.

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move methods inside the instance. Link to code showing the first parameter.

@@ -452,6 +453,9 @@ def __init__(
self.total_train_time = 0
self.sorted_vocab = sorted_vocab
self.batch_words = batch_words

self.load = methodize(load, self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean self.load = methodize(self.load, self) ?
it is very confusing to assign self.load and then define def load

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done on purpose. Please see here for reference.

MethodType binds the method to the calling instance. What I'm doing here is something like overloading load based on whether it is called on instance or class.

So the argument load to methodize is the name of the function that is to be bound.

@@ -1869,3 +1873,15 @@ def __iter__(self):
model.accuracy(args.accuracy)

logger.info("finished running %s", program)

def methodize(func, instance):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are defined outside of the class. Maybe that is why you see no difference in calling these methods and class methods? Can you link to a gist of the code you use to test the first parameter?

Copy link
Contributor Author

@aayux aayux Sep 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methods are defined outside of the class

This was done on purpose. Please see here for reference.

MethodType binds the method to the calling instance. What I'm doing here is something like overloading load based on whether it is called on instance or class.

So the argument load to methodize is the name of the function that is to be bound.

Copy link
Contributor Author

@aayux aayux Sep 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My latest build is failing on Python 2 for some reason, I'm working on that right now. Once that's done, I'll write the tests and merge it back with develop.

@gojomo
Copy link
Collaborator

gojomo commented Oct 3, 2016

Aha, sorry, I thought an obj.func_name() invocation always put obj in as the 1st argument.

This sort of warning would be a nice convenience for explaining a common user error that's been asked-about up a bunch of times (and probably many more times where the user eventually figured it out for themselves)... but it wouldn't justify much extra code or meta-method complexity.

How about if instead, there's a generic utility function (taking any number of positional/keyword params) that just throws an error to the effect of, "this method name should only be called on the class object" – and that function is installed on instances in the relevant method names? That could be a smaller and clearer way to catch the error.

@tmylk tmylk added bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills labels Oct 4, 2016
@aayux
Copy link
Contributor Author

aayux commented Oct 24, 2016

@gojomo you're right, this is a very inefficient way of doing things.

What if I use inspect to get the calling frame and then create a utility that basically parses through the string to figure out if the function was called using Word2Vec.load etc.?

@gojomo
Copy link
Collaborator

gojomo commented Oct 24, 2016

This approach seems a bit more focused. But my concern is still complicating the code with advanced meta-features for such a minor convenience API warning.

(Separate observations: as SaveLoad might even now be used elsewhere, having a list of acceptable-known-subclasses coded inside its method is suboptimal – it shouldn't have to know all its subclass names. Also, at the moment post-upstream-merge, this PR diff is showing other unrelated changes as well – maybe a Github issue or your catchup-merge wasn't to current state of develop?)

@aayux
Copy link
Contributor Author

aayux commented Oct 25, 2016

@gojomo any solution we find for this will require complicating of the code to some extent. We've exhausted a lot of possible options.

Clearly, though, this approach has much fewer meta-features as compared to others.

as SaveLoad might even now be used elsewhere, having a list of acceptable-known-subclasses coded inside its method is suboptimal – it shouldn't have to know all its subclass names.

This is a valid concer. Do you have any recommendations?

maybe a Github issue or your catchup-merge wasn't to current state of develop?

It's the latter – I was editing in a different branch which was up to date with master. I'd forgotten about that.

@gojomo
Copy link
Collaborator

gojomo commented Oct 25, 2016

The suggestion in my 20161002 comment might be minimal in terms of line-count and magic. Roughly (not tested – I think this would work):

def call_on_class_only(*args, **kwargs):
    logger.warn('this method name should only be called on class objects')

(in Word2Vec/etc)

def __init__(...):
    ...
    self.load = call_on_class_only
    ...

Though perhaps also, if iterating over the @classmethods is easy, there'd be a utility function to auto-perform this replacement, so the impact in Word2Vec.__init__ (etc) would just be:

...
hide_class_only_methods(self)
...

(Essentially, this would be a slightly-slimmer refinement of your first approach.)

	modified:   gensim/test/test_word2vec.py
	modified:   gensim/test/test_word2vec.py
	modified:   gensim/test/test_word2vec.py
@gojomo
Copy link
Collaborator

gojomo commented Oct 31, 2016

Changes look good to me, as soon as tests are sensible and pass. Thanks for patiently sticking with this!

	modified:   gensim/test/test_word2vec.py
	modified:   gensim/test/test_word2vec.py
@@ -217,6 +217,10 @@ def any2unicode(text, encoding='utf8', errors='strict'):
return unicode(text, encoding, errors=errors)
to_unicode = any2unicode

def call_on_class_only(*args, **kwargs):
"""Warns when load methods are called on instance"""
logger.warn('This method should be called on a class object.')
Copy link
Owner

@piskvorky piskvorky Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to be failing quietly (or, if logging not configured, in complete silence), ignoring the user's explicit request.

Exception is more appropriate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should have put an exception in my example, so that a coder making this error knows to change their calling code.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dust0x please fix.

@@ -432,6 +432,8 @@ def __init__(

"""

self.load = call_on_class_only
Copy link
Owner

@piskvorky piskvorky Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this will leave the attribute function unbound, leading to potential confusion / serialization issues.

Proper way would be to bind it properly with types.MethodType or somesuch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the serialization tests pass, and the exception message is understandable if it's ever called (by mistake), I don't see what other clarity would come from binding the method. That'd just make it look more like an instance-specific operation... which it specifically is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky, that is what I did at the beginning. Apart from the points @gojomo raises, there are some issues with the way types.MethodTypes is handled in Python versions 2 and 3.

self.assertTrue(warning in str(l))

@log_capture()
def testLoadWord2VecOnClassWarning(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this second test necessary? The first test above is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I'll remove that.

load_on_instance = doc2vec.Doc2Vec()
model = load_on_instance.load(testfile())
warning = "Load methods should be called on a class object."
self.assertTrue(warning in str(l))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this l, where does it come from?

Not a good name for a variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it seems to be the 'usual' name for the LogCapture objects provided by the testfixtures package @log_capture decorator. (See for example: https://testfixtures.readthedocs.io/en/latest/logging.html#the-decorator) But, that would seem to require an l parameter be added to the method, which I don't see here... so a bit baffled that this could work.

Copy link
Contributor Author

@aayux aayux Dec 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No of course it wouldn't work. Sorry, I was trying something you weren't supposed to see this. I should've done this in a separate branch.

binary_model = word2vec.Word2Vec()
load_on_instance = binary_model.load_word2vec_format(testfile(), binary=True)
warning = "Load methods should be called on a class object."
self.assertTrue(warning in str(l))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dtto -- what is l?

@@ -217,6 +217,10 @@ def any2unicode(text, encoding='utf8', errors='strict'):
return unicode(text, encoding, errors=errors)
to_unicode = any2unicode

def call_on_class_only(*args, **kwargs):
"""Warns when load methods are called on instance"""
logger.warn('This method should be called on a class object.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dust0x please fix.

@aayux aayux changed the title [WIP] Raise warning if calling load() on an instance rather than the class Throw exception if load is called on instance rather than the class Dec 4, 2016
@aayux aayux changed the title Throw exception if load is called on instance rather than the class Throw exception if load() is called on instance rather than the class Dec 4, 2016
@aayux
Copy link
Contributor Author

aayux commented Dec 4, 2016

  • Method call_on_class_only now raises AttributeError as requested.

  • Tests for load on instance completed.

  • CI passed.

Sorry for the delay. Request to review.

@tmylk tmylk merged commit 804825e into piskvorky:develop Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants