docs/unsloth.qmd

---
title: "Unsloth"
description: "Hyper-optimized QLoRA finetuning for single GPUs"
---

### Overview

Unsloth provides hand-written optimized kernels for LLM finetuning that slightly improve speed and VRAM over
standard industry baselines.


### Installation

The following will install unsloth from source and downgrade xformers as unsloth is incompatible with the most up
to date libraries.

```bash
pip install --no-deps "unsloth @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps --force-reinstall xformers==0.0.26.post1
```

### Using unsloth w Axolotl

Axolotl exposes a few configuration options to try out unsloth and get most of the performance gains.

Our unsloth integration is currently limited to the following model architectures:
 - llama

These options are specific to LoRA finetuning and cannot be used for multi-GPU finetuning
```yaml
unsloth_lora_mlp: true
unsloth_lora_qkv: true
unsloth_lora_o: true
```

These options are composable and can be used with multi-gpu finetuning
```
unsloth_cross_entropy_loss: true
unsloth_rms_norm: true
unsloth_rope: true
```

### Limitations

- Single GPU only; e.g. no multi-gpu support
- No deepspeed or FSDP support (requires multi-gpu)
- LoRA + QLoRA support only. No full fine tunes or fp8 support.
- Limited model architecture support. Llama, Phi, Gemma, Mistral only
- No MoE support.