Skip to content

Submission to the SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection

Notifications You must be signed in to change notification settings

Apsod/sensecluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sensecluster

Submission to Semeval 2020 task 1: Unsupervised Lexical Semantic Change Detection

The system embeds target words using xlmr.large, clusters the resulting contextualized embeddings using kmeans++, and uses the resulting cluster assignments as a direct proxy for senses.

To run:

# install requirements found in requirements.txt using conda or pip

# Extract the contexts for the given target words
# This populates the directory with LANGUAGE_CORPUS.ctx files.
python mk_contexts.py /path/to/test_data_public

# Run XLMR to construct embeddings for each occurence
# This reads the LANGUAGE_CORPUS.ctx files and creates LANGUAGE_CORPUS.emb files.
python embed.py

# Run clustering on the contextualized embeddings
# This reads the LANGUAGE_CORPUS.emb files and populates the answer/ directory. 
python cluster.py

References

@inproceedings{schlechtweg2020semeval,
title = "{S}em{E}val-2020 {T}ask 1: {U}nsupervised {L}exical {S}emantic {C}hange {D}etection",
author = "Schlechtweg, Dominik and McGillivray, Barbara and Hengchen, Simon and Dubossarsky, Haim and Tahmasebi, Nina",
booktitle = "To appear in Proceedings of the 14th International Workshop on Semantic Evaluation",
year = "2020",
address = "Barcelona, Spain",
publisher = "Association for Computational Linguistics"}

About

Submission to the SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published