Skip to content

Latest commit

 

History

History
79 lines (69 loc) · 4.23 KB

README.md

File metadata and controls

79 lines (69 loc) · 4.23 KB

HyperLex

HyperLex is a gold standard resource for measuring and evaluating how well semantic models capture graded or soft lexical entailment (also known as the type-of, is-a, or hypernymy-hyponymy relation) rather than semantic similarity or relatedness. It quantifies the extent of the semantic category membership and lexical entailment (LE) relation.

HyperLex provides 2616 word pairs (2163 noun pairs and 453 verb pairs) with ratings on a scale 0-6, annotated according to the question: "To what degree is X a type of Y?". Here are some examples:

Pair Rating
girl / person 5.91
citizen / person 5.18
person / citizen 3.10
idol / person 2.57
plant / animal 0.08
to talk / to communicate 5.55
to pray / to communicate 2.90

HyperLex covers plenty of normed word types from the USF free-association database, and provides annotated examples of different WordNet-based lexical relations (i.e., hyponymy-hypernymy at different levels, co-hyponymy, synonymy, antonymy, meronymy-holonymy, no-relation). It also contains examples of different concreteness levels.

Download

Download HyperLex by clicking here.

All design details are described in the following paper. Please cite it if you use HyperLex in your own work:

HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment
Ivan Vulić, Daniela Gerz, Douwe Kiela, Felix Hill, and Anna Korhonen. Computational Linguistics, volume 43, number 4, pages 781-835, 2017.
[pdf] [bib]

The provided archive includes the full HyperLex dataset, noun and verb subsets, as well as two different data splits (random and lexical) into training, development and test data. Please see the accompanying readme file for the file formats and further details.

HyperLex in Other Languages and Cross-Lingual HyperLex

Similar repositories for three other languages (German, Italian, Croatian) based on the original English HyperLex are also available. You can download multilingual and cross-lingual HyperLex by clicking here.

The multilingual and cross-lingual extensions of the original HyperLex data set are described in the following paper. Please cite it if you use the data in your own work:

Multilingual and Cross-Lingual Graded Lexical Entailment
Ivan Vulić, Simone Paolo Ponzetto, and Goran Glavaš. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), pages 4963-4974, 2019.
[pdf] [bib]

Contact

Please contact the first author (Ivan Vulić) if you have any questions not addressed in the referenced papers and the accompanying repo README files.