Skip to content

word2vec model for lexical substitution (finding good substitutions for words in context)

Notifications You must be signed in to change notification settings

Mchristos/lexsub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lexsub : context-sensitive word substitutions using Word2Vec

Disambiguating between the possible senses of a word in the context of a sentence is a fundamental problem in NLP. However, this assumes a universal set of "meanings" to disambiguate between. A more natural but also more practical task is finding a good substitution for a word in context. For example, in the sentence "She went to the bar last night", we know bar means pub, but the word bar has other meanings: a chocolate bar, or a ban/restriction on something.

drawing

This repository uses a Word2Vec embedding based on the Google News corpus, made available here and through the gensim library to rank candidate word substitutions by their suitability to the context of the sentence.

Setup

  1. Download the Google News word vectors from here and make sure you have the gensim package installed.
  2. Make sure you've installed nltk (natural language toolkit) and have downloaded the lin thesaurus and wordnet corpora by executing the following in the python console: import nltk, nltk.download('lin_thesaurus'), nltk.download('wordnet')

Example Usage

from lexsub import LexSub
from gensim.models import KeyedVectors

word2vec_path = "/path/to/GoogleNews-vectors-negative300.bin"
vectors = KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
ls = LexSub(vectors, candidate_generator='lin')

sentence = "She had a drink at the bar"
target = "bar.n"
result = ls.lex_sub(target, sentence)
print(result)
# ['bars', 'pub', 'tavern', 'nightclub', 'restaurant']

About

word2vec model for lexical substitution (finding good substitutions for words in context)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages