NER dataset shared on the huggingface TNER organization.
Dataset | Alias (link) | Domain | Size (train/valid/test) | Language | Entity Size |
---|---|---|---|---|---|
Ontonotes5 | tner/ontonotes5 |
News, Blog, Dialogue | 59,924/8,528/8,262 | en | 18 |
CoNLL2003 | tner/conll2003 |
News | 14,041/3,250/3,453 | en | 4 |
BioNLP2004 | tner/bionlp2004 |
Biochemical | 16,619/1,927/3,856 | en | 5 |
BioCreative V CDR | tner/bc5cdr |
Biomedical | 5,228/5,330/5,865 | en | 2 |
FIN | tner/fin |
Financial News | 1,014/303/150 | en | 4 |
MIT Movie | tner/mit_movie_trivia |
Movie Review | 6,816/1,000/1,953 | en | 12 |
MIT Restaurant | tner/mit_restaurant |
Restaurant Review | 6,900/760/1,521 | en | 8 |
WNUT2017 | tner/wnut2017 |
Twitter, Reddit, StackExchange, YouTube | 2,395/1,009/1,287 | en | 6 |
BTC | tner/btc |
1,014/303/150 | en | 3 | |
Tweebank NER | tner/tweebank_ner |
1,639/710/1,201 | en | 4 | |
TTC | tner/ttc , tner/ttc_dummy |
9,995/500/1,477 | en | 3 | |
TweetNER7 | tner/tweetner7 |
7,111/576/2,807 (*see the dataset page) | en | 7 |
Multilingual dataset follows below.
Dataset | Alias (link) | Domain | Language | Entity Size |
---|---|---|---|---|
WikiANN (Panx) | tner/wikiann |
Wikipedia | 160+ | 3 |
WikiNeural | tner/wikineural |
Wikipedia | 9 | 16 |
MultiNERD | tner/multinerd |
Wikipedia, WikiNews | 16 | 18 |