-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom glove vectors throw tuple index out of range error #1831
Comments
I'm experiencing the same issue.
Then I loaded it into spaCy:
The error happens if I try to read Informations
|
I think the problem is in |
having the same issue |
Thanks for the report, especially @Lankey22 for the suggestion. Perhaps we need this in if self.data.ndim == 1:
self.data = self.data.reshape((self.data.size//width, width)) If so the following mitigation should work for now until the next version: nlp = spacy.load('pt')
nlp.vocab.vectors.from_glove('/path/to/vectors')
if nlp.vocab.vectors.data.ndim == 1:
nlp.vocab.vectors.data = nlp.vocab.vectors.data.reshape((nlp.vocab.vectors.data.size//width, width)) You'll need to know the width of the vectors you're loading. |
I also came across this issue and I'm using the same workaround. I find it weird that from_glove is using numpy.fromfile. The documentation states that using tofile and fromfile is not suitable for data storage: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.fromfile.html If you'd use np.load then it would load a 2D array if it was stored as such. np.fromfile always loads a a 1D array. Not 100% sure how GloVe's binary format is stored, but I would expect as a 2D array. I'm loading word2vec embeddings myself and I saved the conversion in a 2D array. Another thing that strikes me is that in the documentation it is stated that the dtype in the file format should either be 'f' or 'd'. That means that any file read in this manner will get flattened by np.ascontiguousarray, because neither equal the string 'float32'. After flattening it would get reshaped again to a 2D array. Relevant line is here: Line 311 in 2e7391e
I might have made some wrong assumptions, but it seems to me that this code is not running as efficient as it could. Would be great to hear why certain choices were made. I love working with SpaCy and hope it becomes even better in the future :) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I tried loading in some custom glove vectors using the demo provided here:
https://github.com/stanfordnlp/GloVe/blob/master/demo.sh
I then made a directory called vectors with a vectors.50.d.bin inside as well as a vectors.txt
However, when I use the code below I get an IndexError:tuple index out of range
Info about spaCy
The text was updated successfully, but these errors were encountered: