This project aims to provide an api for removing bad words and improve the overall experience of the online communications.
- Detecting Bad words and substituting it with food names
- Detecting the toxicity of the comment/text
a. Cleaning the text by using text hero-Tokenization,Punctuation removal b. POS tagging c. RULE based removal of bad words based on POS Tags
a. Pre processing of text for BERT model[tokenization using BERT tokenizer] b. Adding layers on top of BERT c. Fine tuning on dataset d. Validation e. Predicting for single query
[![Open In Colab]
python -m nltk.downloader universal_tagset
python -m spacy download en
Show Output
'Boolean Questions': ['Is sachin ramesh tendulkar the highest run scorer in '
'cricket?',
'Is sachin ramesh tendulkar the highest run scorer in '
'cricket?',
'Is sachin tendulkar the highest run scorer in '
'cricket?']
Show Output
a. Spacy Encore Sm
b. BERT
MIT