DistillClassifier

About

DistillClassifier is a tool built on top of LLM-VM to easily generate synthetic data for classification tasks using LLMs for distilling LLM knowledge for classification task into much smaller and faster-to-run classification models.

This project was build for the ANARCHY October 2023 Hackathon. Checkout ANARCHY on their github and website.

Team Members:

Partho Das
Karan Janthe

Setup

clone the project from github

git clone https://github.com/daspartho/DistillClassifier

`cd` into the project

cd DistillClassifier

install LLM-VM

git clone https://github.com/anarchy-ai/LLM-VM.git
cd LLM-VM
pip3 install .
cd ..

install python dependencies

pip3 install -r requirements.txt

create an `.env` file and set OpenAI API key (if you want to use openai models) and Huggingface Hub Token (if you want to push the dataset to huggingface):

OPENAI_API_KEY=
HF_HUB_TOKEN=

Run

You can run the tool from command line like this:

python3 generation.py <columns> <n_examples> [-m <model>] [-f <filename>] [-r <repo>]

Arguments:

<columns>: Column information as a dictionary.
<n_examples>: Number of examples to be generated.
-m, --model: (Optional) Model name. Defaults to "chat_gpt".
-f, --filename: (Optional) Dataset filename. Defaults to "dataset.json".
-r, --repo: (Optional) HuggingFace repo ID". Defaults to "None"

Example:

python3 generation.py '{"text": "either spoiler or not spoiler text", "label": "if text is spoiler or not"}' 25 -m 'chat_gpt' -f 'dataset.json' -r 'spoiler_or_not'

or run the `demo.py` file directly:

python3 demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DistillClassifier

About

Team Members:

Setup

clone the project from github

`cd` into the project

install LLM-VM

install python dependencies

create an `.env` file and set OpenAI API key (if you want to use openai models) and Huggingface Hub Token (if you want to push the dataset to huggingface):

Run

You can run the tool from command line like this:

Arguments:

Example:

or run the `demo.py` file directly:

example output dataset:

demo_dataset.json

demo dataset on huggingface

LICENSE

Files

README.md

Latest commit

History

README.md

File metadata and controls

DistillClassifier

About

Team Members:

Setup

clone the project from github

cd into the project

install LLM-VM

install python dependencies

create an .env file and set OpenAI API key (if you want to use openai models) and Huggingface Hub Token (if you want to push the dataset to huggingface):

Run

You can run the tool from command line like this:

Arguments:

Example:

or run the demo.py file directly:

example output dataset:

demo_dataset.json

demo dataset on huggingface

LICENSE

`cd` into the project

create an `.env` file and set OpenAI API key (if you want to use openai models) and Huggingface Hub Token (if you want to push the dataset to huggingface):

or run the `demo.py` file directly: