A collection of multilingual sentiments datasets grouped into 3 classes -- positive, neutral, negative.
Most multilingual sentiment datasets are either 2-class positive or negative, 5-class ratings of products reviews (e.g. Amazon multilingual dataset) or multiple classes of emotions. However, to an average person, sometimes positive, negative and neutral classes suffice and are more straightforward to perceive and annotate. Also, a positive/negative classification is too naive, most of the text in the world is actually neutral in sentiment. Furthermore, most multilingual sentiment datasets don't include Asian languages (e.g. Malay, Indonesian) and are dominated by Western languages (e.g. English, German).
For emotions related datasets, I group the negative (respectively positive) emotions into the negative (respectively positive) class. For ratings datasets I assign the 1 star reviews to the negative class, 3 star review to the neutral class and assign the 5 star review to the positive class.
Disclaimer: All credits goes to the respective dataset owners, this repository is merely an aggregation of the datasets.
Dataset name | Language | Source | No. of texts | Classes |
---|---|---|---|---|
IndoNLU (EmoT) | Indonesian | train: 3521 val: 440 test: 440 |
anger, fear, happy, love, sadness | |
IndoNLU (SmSA) | Indonesian | Online platforms | train: 11000 val: 1260 test: 500 |
positive, negative, neutral |
IndoNLU (CASA) | Indonesian | Automobile platforms | train: 810 val: 90 test: 180 |
positive, negative, neutral (6 aspects) |
IndoNLU (HoASA) | Indonesian | Hotel reviews | train: 2283 val: 285 test: 286 |
positive, negative, neutral, positive-negative (10 aspects) |
Multilingual Amazon Reviews |
English, Japanese, German, French, Chinese, Spanish | Amazon | For each language: train: 200,000 val: 5,000 test: 5,000 |
1 star, 2 star, 3 star, 4 star, 5 star |
GoEmotions | English | 211225 | admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise |
|
Offenseval Dravidian | Tamil-English, Malayalam-English, Kannada-English | Social media | Tamil: train: 35139 val: 4388 Malayalam: train: 16010 val: 1999 Kannada: train: 6217 val: 777 |
Not_offensive, Offensive_Untargetede, Offensive_Targeted_Insult_Individual, Offensive_Targeted_Insult_Group, Offensive_Targeted_Insult_Other, not-{lang} |
SemEval-2018 Task 1: Affect in Tweets |
English, Arabic, Spanish | English: train: 6838 val: 886 test: 3259 Spanish: train: 3561 val: 679 test: 2854 Arabic: train: 2278 val: 585 test: 1518 |
anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust |
|
Emotion | English | train: 16000 val: 2000 test: 2000 |
anger, anticipation, disgust, fear, joy, sadness, surprise, and trust | |
IMDB | English | Movies | train: 25000 test: 25000 |
positive, negative |
Amazon Polarity | English | Amazon | train: 3600000 test: 400000 |
positive, negative |
Yelp Reviews | English | Yelp | train: 650000 test: 50000 |
1 star, 2 star, 3 star, 4 star, 5 star |
Yelp Polarity | English | Yelp | train: 560,000 test: 38,000 |
positive, negative |