Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove direct access to properties moved to KeyedVectors #1147

Merged
merged 5 commits into from
Feb 16, 2017

Conversation

tmylk
Copy link
Contributor

@tmylk tmylk commented Feb 14, 2017

Following up on deprecation warnings added in #980

@gojomo
Copy link
Collaborator

gojomo commented Feb 21, 2017

@tmylk - the class comments still reference load_word2vec_format (etc), with now-invalid examples-of-use, and there's still the call_on_class_only shim that's now unneeded.

@tmylk
Copy link
Contributor Author

tmylk commented Feb 21, 2017

Agree, also the method should throw a "Deprecated, please use KeyedVectors" exception because it confuses a lot of user and it is not clear where did this function go.

@piskvorky
Copy link
Owner

piskvorky commented Mar 3, 2017

I'm getting lost too. Why not just return what the user asked for, instead of raising an exception? Don't we have that information?

@tmylk
Copy link
Contributor Author

tmylk commented Mar 3, 2017

After removing the direct access now there is only one way left to access the attribute. It makes things simpler and uniform across various embeddings. Do you agree?
There used to be two ways. We raised deprecation warnings for 2 months on one of the ways to access it. Now it became a deprecation exception.

@piskvorky
Copy link
Owner

It's fine if the "recommended way" changes to something else, no problem with that.

But raising exceptions, breaking backward compatibility, where we could actually serve the request no problem, sounds strange. I don't remember the reasoning any more, but it looks plain malicious to me. Why would we do that?

@tmylk
Copy link
Contributor Author

tmylk commented Mar 3, 2017

Sorry, I don't understand the question. One way to access an attribute is better than two ways. The reason we added KeyedVectors class is described in the release notes. After adding KeyedVectors there were two ways to do things so we needed to deprecate one of them.

@gojomo
Copy link
Collaborator

gojomo commented Mar 9, 2017

As I recall the reasoning:

There would be somewhat of a surprise/violating-expectations problem if Word2Vec.load_word2vec_format() still worked, but now returned a new, different type (KeyedVectors). Maybe that'd still be OK, but it seems a little 'off'.

If it still returned a Word2Vec instance that was only-good-for-lookups (and thus really only functionally-equivalent to KeyedVectors), that'd have maintained expectations, but also continued the somewhat messy conflation of 'improted vectors' and 'trainable model' that the refactoring was meant to address. So that path was appropriate for a while – and maybe the deprecation period should have been longer – but at some point the eventual change would still be "stop calling this through Word2Vec, because what you're loading isn't necessarily logically linked with the internals of this class".

@piskvorky
Copy link
Owner

piskvorky commented Mar 10, 2017

If I understand correctly, continuing to support attribute access like model.syn0 costs us nothing (we just alias a few variables, we have all the necessary information).

On the other hand, not supporting it costs a lot to the users, who are confused about this (seemingly) useless obfuscation of model.wv.syn0.

Weighing the costs/benefits, I don't think removing the support is a good choice. It feels capricious.

But maybe such breaking changes will force users to upgrade :-)

@tmylk
Copy link
Contributor Author

tmylk commented Mar 12, 2017

The cost of cleanly supporting both model.syn0 and model.wv.syn0 would have been a change to Save/Load API to make ignore a class variable as suggested by @jayantj .

Reasons why this change is required:

  1. In 0.13.4.1 syn0 and others were saved twice - once in word2vec save() and another time in KeyedVectors save_specials. Ignoring it in word2vec and not keyedvectors is impossible because ignore is not a class variable.

  2. The warnings in save for syn0norm direct access had to be silenced in a not so pretty way in Disable warnings of direct access to KeyedVectors save and load of Word2vec/Doc2vec #1072

Another minor cost is that bugs like #1173 would be harder to catch where "the vectors are not loaded into the syn0 attribute of the KeyedVector instance, and are instead loaded into the syn0 attribute of the Word2Vec instance".

I avoided the change to Save/Load API because it affects all the models and thus a high risk.

The deprecation seemed a better choice because it only affects word2vec and the users were warned with Warnings for 2 months. It was a learning experience in how much attention users pay to warnings. And they pay less attention than I wish they did. :) In hindsight, in this release instead of a complete code path deprecation, the warning should have become an Exception suggesting to use wv.syn0, like #1107 for load_word2vec_format. Removing the Exception code path in the next release would not have required a major version bump because the working API remained the same.

There are also advantages of deprecation as @gojomo wrote above: better API for a minimal read-only model and uniform code between word2vec and other embeddings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants