EasyDistPLMs

Introduction

We provide a simple pytorch-based PLM finetuning example with well-formed structure. You can simply build and run your fine-tuning task with tiny modification of the code. Meanwhile, we provide different distributed-training approaches with little change of the original code. You can checkout deepspeed / horovod branch to try it yourself.

Requirements

conda create -n torch_env python=3.9 pandas tqdm scikit-learn -y
conda activate torch_env
conda install pytorch cudatoolkit=11.3.1 -y
pip install transformers wandb

Train

Download transformers' pretrained model files (pytorch_model.bin, config.json, vocab.txt ...) and put them in one dir, eg. pretrained
Customize a dataset in src/datasets.py. We provide IMDB and SNLI dataset as demos. Basically, for sent /sent-pair classification task, the only thing you need to do is to inherit SeqCLSDataset class and implement read_line / read_example according to your data format.
Create labelspace file containing all labels, sep by line break
Edit scripts/train.sh
(optional) --use_wandb and set wandb_key to enable logging with wandb.ai
Activate conda env and Run it!
```
bash scripts/train.sh
```

Debug

Fairseq wraps a multiprocessing-supported pdb. You can use from debugger.mp_pdb import pdb;pdb.set_trace in our code to debug in real time. See common usage at https://docs.python.org/3/library/pdb.html

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
debugger		debugger
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EasyDistPLMs

Introduction

Requirements

Train

Debug

About

Releases

Packages

Contributors 2

Languages

License

cofe-ai/EasyDistPLMs

Folders and files

Latest commit

History

Repository files navigation

EasyDistPLMs

Introduction

Requirements

Train

Debug

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages