Skip to content

Latest commit

 

History

History

peft-curation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Curating Datasets for Parameter Efficient Fine-tuning

This tutorial demonstrates the usage of NeMo Curator's Python API to curate a dataset for parameter-efficient fine-tuning (PEFT).

In this tutorial, we use the Enron Emails dataset, which is a dataset of emails with corresponding classification labels for each email. Each email has a subject, a body and a category (class label). We demonstrate various filtering and processing operations that can be applied to each record.

Walkthrough

For a detailed walkthrough of this tutorial, please see the following blog post:

Usage

After installing the NeMo Curator package, you can simply run the following command:

python tutorials/peft-curation/main.py

By default, this tutorial will use at most 8 workers to run the curation pipeline. If you face any out of memory issues, you can reduce the number of workers by supplying the --n-workers=N argument, where N is the number of workers to spawn.