Identifying and Categorizing Offensive Language in Social Media
Team Name : JU_ETCE_17_21
System Description Paper : https://www.aclweb.org/anthology/S19-2118
- Sub-task A: Offensive language identification
- Sub-task B: Automatic categorization of offense types
- Sub-task C: Offense target identification
@inproceedings{mukherjee-etal-2019-ju,
title = "{JU}{\_}{ETCE}{\_}17{\_}21 at {S}em{E}val-2019 Task 6: Efficient Machine Learning and Neural Network Approaches for Identifying and Categorizing Offensive Language in Tweets",
author = "Mukherjee, Preeti and
Pal, Mainak and
Banerjee, Somnath and
Naskar, Sudip Kumar",
booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota, USA",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/S19-2118",
doi = "10.18653/v1/S19-2118",
pages = "662--667",
abstract = "This paper describes our system submissions as part of our participation (team name: JU{\_}ETCE{\_}17{\_}21) in the SemEval 2019 shared task 6: {``}OffensEval: Identifying and Catego- rizing Offensive Language in Social Media{''}. We participated in all the three sub-tasks: i) Sub-task A: offensive language identification, ii) Sub-task B: automatic categorization of of- fense types, and iii) Sub-task C: offense target identification. We employed machine learn- ing as well as deep learning approaches for the sub-tasks. We employed Convolutional Neural Network (CNN) and Recursive Neu- ral Network (RNN) Long Short-Term Memory (LSTM) with pre-trained word embeddings. We used both word2vec and Glove pre-trained word embeddings. We obtained the best F1- score using CNN based model for sub-task A, LSTM based model for sub-task B and Lo- gistic Regression based model for sub-task C. Our best submissions achieved 0.7844, 0.5459 and 0.48 F1-scores for sub-task A, sub-task B and sub-task C respectively.",
}