Skip to content

Latest commit

 

History

History
24 lines (22 loc) · 3.76 KB

DATASET_CARD.md

File metadata and controls

24 lines (22 loc) · 3.76 KB

Dataset Card

NER dataset shared on the huggingface TNER organization.

Dataset Alias (link) Domain Size (train/valid/test) Language Entity Size
Ontonotes5 tner/ontonotes5 News, Blog, Dialogue 59,924/8,528/8,262 en 18
CoNLL2003 tner/conll2003 News 14,041/3,250/3,453 en 4
BioNLP2004 tner/bionlp2004 Biochemical 16,619/1,927/3,856 en 5
BioCreative V CDR tner/bc5cdr Biomedical 5,228/5,330/5,865 en 2
FIN tner/fin Financial News 1,014/303/150 en 4
MIT Movie tner/mit_movie_trivia Movie Review 6,816/1,000/1,953 en 12
MIT Restaurant tner/mit_restaurant Restaurant Review 6,900/760/1,521 en 8
WNUT2017 tner/wnut2017 Twitter, Reddit, StackExchange, YouTube 2,395/1,009/1,287 en 6
BTC tner/btc Twitter 1,014/303/150 en 3
Tweebank NER tner/tweebank_ner Twitter 1,639/710/1,201 en 4
TTC tner/ttc, tner/ttc_dummy Twitter 9,995/500/1,477 en 3
TweetNER7 tner/tweetner7 Twitter 7,111/576/2,807 (*see the dataset page) en 7

Multilingual dataset follows below.

Dataset Alias (link) Domain Language Entity Size
WikiANN (Panx) tner/wikiann Wikipedia 160+ 3
WikiNeural tner/wikineural Wikipedia 9 16
MultiNERD tner/multinerd Wikipedia, WikiNews 16 18