21 Aug 01:08

danielhanchen

be8b3d8

Phi 3.5 Latest

Latest

Phi 3.5 is here!

Try it out here: https://colab.research.google.com/drive/1lN6hPQveB_mHSnTOYifygFcrO8C1bxq4?usp=sharing

What's Changed

Llama 3.1 by @danielhanchen in #797
Better debugging by @danielhanchen in #826
fix UnboundLocalError by @xyangk in #834
Gemma by @danielhanchen in #843
Fix ROPE extension issue and device mismatch by @xyangk in #840
Fix RoPE extension by @danielhanchen in #846
fix: fix config.torch_dtype bug by @relic-yuexi in #874
pascal support by @emuchogu in #870
Fix tokenizers by @danielhanchen in #887
Torch 2.4, Xformers>0.0.27, TRL>0.9, Python 3.12 + bug fixes by @danielhanchen in #902
Fix DPO stats by @danielhanchen in #906
Fix Chat Templates by @danielhanchen in #916
Fix chat templates by @danielhanchen in #917
Bug Fixes by @danielhanchen in #920
Fix mapping by @danielhanchen in #921
untrained tokens llama 3.1 base by @danielhanchen in #929
Bug #930 by @danielhanchen in #931
Fix NEFTune by @danielhanchen in #937
Update README.md by @danielhanchen in #938

New Contributors

@relic-yuexi made their first contribution in #874
@emuchogu made their first contribution in #870

Full Changelog: July-Mistral-2024...August-2024

Contributors

emuchogu, xyangk, and 2 other contributors

Assets 2

23 Jul 20:42

danielhanchen

July-Llama-2024

d1f3b6c

Llama 3.1 Support

Excited to announce Unsloth makes finetuning Llama 3.1 2.1x faster and use 60% less VRAM! Read up on our release here: https://unsloth.ai/blog/llama3-1

We uploaded a Google Colab notebook to finetune Llama 3.1 (8B) on a free Tesla T4: Llama 3.1 (8B) Notebook. We also have a new UI on Google Colab for chatting with your Llama 3.1 Instruct models which uses our own 2x faster inference engine.

Run UI Preview

We created a new chat UI using Gradio where users can upload and chat with their Llama 3.1 Instruct models online for free on Google Colab.

We uploaded 4bit bitsandbytes quants here: https://huggingface.co/unsloth
To finetune Llama 3.1, please update Unsloth:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Assets 2

19 Jul 15:37

danielhanchen

July-Mistral-2024

568bfdd

July-Mistral-2024

Mistral NeMo, Ollama & CSV support

See https://unsloth.ai/blog/mistral-nemo for more details. 4 bit pre-quantized weights at https://huggingface.co/unsloth

2x faster 60% less VRAM Colab finetuning notebook here and also our Kaggle notebook is here

Export to Ollama & CSV Support

To use, create and customize your chat template with a dataset and Unsloth will automatically export the finetune to Ollama with automatic Modelfile creation. We also created a 'Step-by-Step Tutorial on How to Finetune Llama-3 and Deploy to Ollama'. Check out our Ollama Llama-3 Alpaca and CSV/Excel Ollama Guide notebooks.

Unlike regular chat templates that use 3 columns, Ollama simplifies the process with just 2 columns: instruction and output. And with Ollama, you can save, run, and deploy your finetuned models locally on your own device.

Train on Completions / Inputs

We now support training only on the output tokens and not the inputs, which can increase accuracy. Try it with:

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    ...
    args = TrainingArguments(
        ...
    ),
)
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(trainer)

RoPE Scaling for all models

We now allow you to finetune Gemma 2, Mistral, Mistral NeMo, Qwen2 and more models with “unlimited” context lengths through RoPE linear scaling through Unsloth. Coupled with our 4x longer context support, Unsloth can do extremely long context support!

New Docs!

Introducing our new Documentation site which has all the most important info about Unsloth in one place. If you'd like to contribute, please contact us! Docs: https://docs.unsloth.ai/

Update instructions

Please update Unsloth in local machines (Colab and Kaggle just refresh and reload notebooks) via:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Assets 2

03 Jul 22:02

danielhanchen

July-2024

5ab565f

2x faster Gemma 2

Gemma 2 support

We now support Gemma 2! It's 2x faster and uses 63% less VRAM than HF+FA2!

We have a Gemma 2 9b notebook here: https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing

To use Gemma 2, please update Unsloth:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Head over to our blog post: https://unsloth.ai/blog/gemma2 for more details.

We uploaded 4bit quants for 4x faster downloading to:

https://huggingface.co/unsloth/gemma-2-9b-bnb-4bit

https://huggingface.co/unsloth/gemma-2-27b-bnb-4bit

https://huggingface.co/unsloth/gemma-2-9b-it-bnb-4bit

https://huggingface.co/unsloth/gemma-2-27b-it-bnb-4bit

Continued pretraining

You can now do continued pretraining with Unsloth. See https://unsloth.ai/blog/contpretraining for more details!

Continued pretraining is 2x faster and uses 50% less VRAM than HF + FA2 QLoRA. We offload embed_tokens and lm_head to disk to save VRAM!

You can now simply use both in the target modules like below:

model = FastLanguageModel.get_peft_model(
    model,
    r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",
                      "embed_tokens", "lm_head",], # Add for continual pretraining
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

We also allow 2 learning rates - one for the embedding matrices and another for the LoRA adapters:

from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments

trainer = UnslothTrainer(
    args = UnslothTrainingArguments(
        ....
        learning_rate = 5e-5,
        embedding_learning_rate = 5e-6,
    ),
)

We also share a free Colab to finetune Mistral v3 to learn Korean (you can select any language you like) using Wikipedia and the Aya Dataset: https://colab.research.google.com/drive/1tEd1FrOXWMnCU9UIvdYhs61tkxdMuKZu?usp=sharing

And we're sharing our free Colab notebook for continued pretraining for text completion: https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing

What's Changed

Ollama Chat Templates by @danielhanchen in #582
Fix case where GGUF saving fails when model_dtype is torch.float16 ("f16") by @chrehall68 in #630
Support revision parameter in FastLanguageModel.from_pretrained by @chrehall68 in #629
clears any selected_adapters before calling internal_model.save_pretr… by @neph1 in #609
Check for incompatible modules before importing unsloth by @xyangk in #602
Fix #603 handling of formatting_func in tokenizer_utils for assitant/chat/completion training by @Oseltamivir in #604
Add GGML saving option to Unsloth for easier Ollama model creation and testing. by @mahiatlinux in #345
Add Documentation for LoraConfig Parameters by @sebdg in #619
llama.cpp failing by @bet0x in #371
fix libcuda_dirs import for triton 3.0 by @t-vi in #227
Nightly by @danielhanchen in #632
README: Fix minor typo. by @shaper in #559
Qwen bug fixes by @danielhanchen in #639
Fix segfaults by @danielhanchen in #641
Nightly by @danielhanchen in #646
Nightly by @danielhanchen in #648
Nightly by @danielhanchen in #649
Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf by @ArcadaLabs-Jason in #651
Revert "Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf" by @danielhanchen in #652
Revert "Revert "Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf"" by @danielhanchen in #653
Fix GGUF by @danielhanchen in #654
Fix continuing LoRA finetuning by @danielhanchen in #656