This is just a modified version of the prepare_dataset and train scripts from https://github.com/pacman100/DHS-LLM-Workshop to finetune a LLM
prepare_dataset.py
is used to create dataset and upload it to hugging face. Usually called on a local machine.
train.py
is used to train a given model with a given dataset. This is used in the colab notebooks.
requirements.txt
contains all the needed packages to train a model. This is used in the colab notebooks.