This project is dedicated to the corpora described in the paper:
@article{brasoveanu2020conll,
author = {Adrian M.P. Bra{\c{s}}oveanu and Albert Weichselbraun and Lyndon J.B. Nixon},
title = {In Media Res: A Corpus for Evaluating Named Entity Linking with Creative Works},
booktitle = {CoNLL 2020},
publisher = {ACL},
language = {English},
year = {2020},
pages = {355-364},
month = {november},
date = {19-20},
url = {https://www.aclweb.org/anthology/2020.conll-1.28}
}
A video of the slides presented at EMNLP / CoNLL 2020 is available here
The guidelines offered to the annotators are available in PDF format.
The corpora is available in the CSV format.
The document fields have the following meaning:
docno - document id
doctype - the partition of the document
text - the content of the document
The annotation fields have the following meaning:
docno - document id
surfaceform - the entity mention's surface form
type - entity type
link - DBpedia link for the entity
The 3 lenses described in the paper are available for each partition.
- TSV
- JSON (upon request)
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)