Skip to content

A document corpus for evaluating Named Entity Recognition and Linking (NER/NEL) systems on detection of entities which are of type Creative Work.

Notifications You must be signed in to change notification settings

modultechnology/in_media_res

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In Media Res

This project is dedicated to the corpora described in the paper:

@article{brasoveanu2020conll,
  author = {Adrian M.P. Bra{\c{s}}oveanu and Albert Weichselbraun and Lyndon J.B. Nixon},
  title = {In Media Res: A Corpus for Evaluating Named Entity Linking with Creative Works},
  booktitle = {CoNLL 2020}, 
  publisher = {ACL},
  language = {English},
  year = {2020},
  pages = {355-364},
  month = {november},
  date = {19-20},
  url = {https://www.aclweb.org/anthology/2020.conll-1.28}
}

A video of the slides presented at EMNLP / CoNLL 2020 is available here

ANNOTATION GUIDELINES

The guidelines offered to the annotators are available in PDF format.

CORPORA

The corpora is available in the CSV format.

The document fields have the following meaning:

docno - document id
doctype - the partition of the document
text - the content of the document

The annotation fields have the following meaning:

docno - document id
surfaceform - the entity mention's surface form
type - entity type
link - DBpedia link for the entity

The 3 lenses described in the paper are available for each partition.

DATA FORMATS

  • TSV
  • JSON (upon request)

LICENSE

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

About

A document corpus for evaluating Named Entity Recognition and Linking (NER/NEL) systems on detection of entities which are of type Creative Work.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published