Overview

This project contains code for the Toxic Comment Classification Challenge in Kaggle.

The goal of the competition is to identify and classify toxic online comments.

Prerequisites

You need to install poetry before moving forward. Follow the instructions in this link.

git clone https://github.com/david1542/toxic-comments.git

poetry install

Authenticate to Kaggle CLI. Follow these instructions.
Downgrade PyTorch to 1.12.1, since in later versions there are mismatches in the CUDA drivers (issue):

pip install torch==1.12.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html

./scripts/download_data.sh

Hydra is used as a configuration manager. Simply run the train.py script and edit the parameters as you like:

python src/train.py training_args.learning_rate=1e-3 training_args.num_train_epochs=5

For more information about the parameters, go to configs/train.yaml.

Some nice articles that I've found while working on this problem:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
notebooks		notebooks
scripts		scripts
src		src
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml