Finetuning Llama-3 on your own data

Note: This was forked from https://github.com/Nutlope/finetune and modified to include the Autoblocks AI SDK.

Finetuning Llama-3 on your own data

This repo gives you the code to fine-tune Llama-3 on your own data. In this example, we'll be finetuning on 500 pieces of data from the Math Instruct dataset from TIGER-Lab. LLMs are known for not being the best at complex multi-step math problems so we want to fine-tune an LLM on some of these problems and see how well it does.

We'll go through data cleaning, uploading your dataset, fine-tuning LLama-3-8B on it, then running evals to show the accuracy vs the base model. Fine-tuning will happen on Together and costs $5 with the current pricing. Evals will happen on Autoblocks AI.

Fine-tuning Llama-3 on MathInstruct

Make an account at Together AI and save your API key as an OS variable called TOGETHER_API_KEY.
Install the Together AI python library by running pip install together.
Install the Autoblocks AI python library by running pip install autoblocksai.
Make an account with Autoblocks AI and save your API key as AUTOBLOCKS_API_KEY.
Run 1-transform.py to do some data cleaning and get it into a format Together accepts.
Run 2-finetune.py to upload the dataset and start the fine-tuning job on Together.
Run npx autoblocks testing exec -- python3 3-eval.py to evaluate the fine-tuned model using the Autoblocks CLI.

Results

Note: This repo contains 500 problems for training but we finetuned our model on 207k problems

After fine-tuning Llama-3-8B on 207k math problems from the MathInstruct dataset, we ran an eval of 1000 new math problems through to compare. Here were the results:

Base model (Llama-3-8b): 47.2%
Fine-tuned (Llama-3-8b) model: 65.2%
Top OSS model (Llama-3-70b): 64.2%
Top proprietary model (GPT-4o): 71.4%

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
1-transform.py		1-transform.py
2-finetune.py		2-finetune.py
3-eval.py		3-eval.py
EvalDataset-100.json		EvalDataset-100.json
EvalDataset-1000.json		EvalDataset-1000.json
README.md		README.md
TrainMathInstruct-500.json		TrainMathInstruct-500.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finetuning Llama-3 on your own data

Fine-tuning Llama-3 on MathInstruct

Results

About

Releases

Packages

Languages

autoblocksai/autoblocks-together-ai

Folders and files

Latest commit

History

Repository files navigation

Finetuning Llama-3 on your own data

Fine-tuning Llama-3 on MathInstruct

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages