Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect encoding when reading binary keyed vectors #3309

Commits on Mar 20, 2022

  1. Respect encoding when reading binary keyed vectors

    Current implementation fails to read keyed vectors that have iso-8859-1
    encoding in the words when encoded in binary format. An example of this
    type of a file can be seen in the turkuNLP finnish embeddings:
    
    http://dl.turkunlp.org/finnish-embeddings/finnish_s24_skgram.bin
    
    This file is quite trivial to load by passing the encoding to the vector
    loading function. It is also logical that when user asks
    
    KeyedVectors.load_word2vec_format(filename, binary=True, encoding='iso-8859-1')
    
    The library would try to load the file assuming that the matrix is in
    binary format and the words are encoded using iso-8859-1 encoding.
    alhoo committed Mar 20, 2022
    Configuration menu
    Copy the full SHA
    c59a412 View commit details
    Browse the repository at this point in the history

Commits on Apr 15, 2022

  1. Update keyedvectors.py

    piskvorky committed Apr 15, 2022
    Configuration menu
    Copy the full SHA
    998074e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    662e380 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    17c6cf0 View commit details
    Browse the repository at this point in the history