Sensecluster

Submission to Semeval 2020 task 1: Unsupervised Lexical Semantic Change Detection

The system embeds target words using xlmr.large, clusters the resulting contextualized embeddings using kmeans++, and uses the resulting cluster assignments as a direct proxy for senses.

To run:

# install requirements found in requirements.txt using conda or pip

# Extract the contexts for the given target words
# This populates the directory with LANGUAGE_CORPUS.ctx files.
python mk_contexts.py /path/to/test_data_public

# Run XLMR to construct embeddings for each occurence
# This reads the LANGUAGE_CORPUS.ctx files and creates LANGUAGE_CORPUS.emb files.
python embed.py

# Run clustering on the contextualized embeddings
# This reads the LANGUAGE_CORPUS.emb files and populates the answer/ directory. 
python cluster.py

References

@inproceedings{schlechtweg2020semeval,
title = "{S}em{E}val-2020 {T}ask 1: {U}nsupervised {L}exical {S}emantic {C}hange {D}etection",
author = "Schlechtweg, Dominik and McGillivray, Barbara and Hengchen, Simon and Dubossarsky, Haim and Tahmasebi, Nina",
booktitle = "To appear in Proceedings of the 14th International Workshop on Semantic Evaluation",
year = "2020",
address = "Barcelona, Spain",
publisher = "Association for Computational Linguistics"}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
cluster.py		cluster.py
embed.py		embed.py
mk_contexts.py		mk_contexts.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sensecluster

References

About

Releases

Packages

Languages

Apsod/sensecluster

Folders and files

Latest commit

History

Repository files navigation

Sensecluster

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages