Skip to content

Latest commit

 

History

History
57 lines (40 loc) · 1.35 KB

README.md

File metadata and controls

57 lines (40 loc) · 1.35 KB

In Media Res

This project is dedicated to the corpora described in the paper:

@article{brasoveanu2020conll,
  author = {Adrian M.P. Bra{\c{s}}oveanu and Albert Weichselbraun and Lyndon J.B. Nixon},
  title = {In Media Res: A Corpus for Evaluating Named Entity Linking with Creative Works},
  booktitle = {CoNLL 2020}, 
  publisher = {ACL},
  language = {English},
  year = {2020},
  pages = {355-364},
  month = {november},
  date = {19-20},
  url = {https://www.aclweb.org/anthology/2020.conll-1.28}
}

A video of the slides presented at EMNLP / CoNLL 2020 is available here

ANNOTATION GUIDELINES

The guidelines offered to the annotators are available in PDF format.

CORPORA

The corpora is available in the CSV format.

The document fields have the following meaning:

docno - document id
doctype - the partition of the document
text - the content of the document

The annotation fields have the following meaning:

docno - document id
surfaceform - the entity mention's surface form
type - entity type
link - DBpedia link for the entity

The 3 lenses described in the paper are available for each partition.

DATA FORMATS

  • TSV
  • JSON (upon request)

LICENSE

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)