-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output size smaller than original #13
Comments
hi @zharenkov, hi @mfaruqui, |
Line #44 in the code is truncating the float to only 4 digits after decimal. If the total number of words in the input and output are same, this is fine. |
thanks for the answer @mfaruqui! figured out it was due to words in the original file being contained in upper as well as in lowercase, while the retrofitted embeddings are all lowercase |
I'm losing around 3% of vectors when retrofitted. |
To detail on my issue and clarify - the 56 vectors themselves aren't missing but they're missing dimensions! |
Hi, @mfaruqui
I'm passing to retrofit.py glove's embedding file 840B.300d. Its size is about 5,5gb, but result file's size is 3.7gb (for wordnet and for paraphrase). Is it correct behaviour? If yes - can you please explain why size is decresing so significantly?
Thanks!
The text was updated successfully, but these errors were encountered: