-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing PR 1326 and providing some tests for unicode wiki corpora #1333
Conversation
gensim/models/word2vec.py
Outdated
@@ -1112,7 +1112,7 @@ def update_weights(self): | |||
# randomize the remaining words | |||
for i in xrange(len(self.wv.syn0), len(self.wv.vocab)): | |||
# construct deterministic seed from word AND seed argument | |||
newsyn0[i-len(self.wv.syn0)] = self.seeded_vector(self.wv.index2word[i] + str(self.seed)) | |||
newsyn0[i-len(self.wv.syn0)] = self.seeded_vector(utils.to_unicode(self.wv.index2word[i]) + str(self.seed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change inside word2vec.py
, and similar below, absolutely necessary? It seems if the caller has provided unicode strings in their corpus, the contents of index2word()
will already be unicode. But it seems unless seeded_vector()
is otherwise throwing an exception, it really doesn't matter whether this string-used-for-deterministic-randomization is unicode or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about the same, though it seemed nice to be explicit about this. But you're right - once the problem in the corpus is addressed, this is not causing issues anymore.
I've reverted the changes.
Guys, I'd appreciate if someone hints me on how to debug this job: https://travis-ci.org/RaRe-Technologies/gensim/jobs/233524516 It seems that for some reason on py2.7 an existing test takes too long to complete. Given this one is unchanged, I'm inclined to think it's a build-related hiccup, but.. then again - I'm not used to CI Travis and I'm not sure on how to respin the build without changing the pull request.. Thoughts/hints on how to proceed? |
gensim/test/test_wikicorpus.py
Outdated
First unicode article in this sample is | ||
1) папа | ||
""" | ||
if sys.version_info < (2, 7, 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we dropped support for 2.6 , we can also remove this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll clean this up too
@alekol This test is known to be flaky on Travis. It hasn't been investigated. Restarted the build and hope it passes. |
@tmylk How can I restart a build without making fake commits? |
It requires admin privileges to the project unfortunately so there isn't a way. |
Ok, any idea how to run these tests locally or debug them? For some reason on py 2.7 one of the tests seems to take too long (> 10min) and I really have no idea how to troubleshoot this.. |
@alekol you can run specific test with this command |
I hit the restart on the timed-out test, and it appears to be hanging on the Has this been a problem before this fix? If so, it's a shame it's throwing uncertainty into other fixes. It may be related to the attempt to use multiprocessing (in a Python process on a minimal build container that's already thick with prior test state). It may be a good idea to explicitly specify |
@gojomo Thanks for the hint! Seems that this was indeed the cause as applying it resulted in a test pass! :) Being the github noob that I am, what're the next steps to get this into develop? I expect that someone with sufficient privileges will do the merge or should I? |
Glad to hear! At this point it'd be up to a project committer to merge the PR. I'll leave that to @tmylk , but everything looks good to me. (One-line fix, test case, tweak to make existing tests less hang-prone.) |
Thank you @alekol |
A few things:
This is my first pull request, so.. I may have missed smth :)