-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Performance Comparison
hoshi-hiyouga edited this page Apr 15, 2024
·
5 revisions
Method | Bits | TGS | VRAM | Speed |
---|---|---|---|---|
HF | 16 | 2,392 | 18GB | 100% |
HF+FA2 | 16 | 2,954 | 17GB | 123% |
Unsloth+FA2 | 16 | 4,007 | 16GB | 168% |
HF | 4 | 2,415 | 9GB | 101% |
Unsloth+FA2 | 4 | 3,726 | 7GB | 160% |
Method | Bits | TGS | VRAM | Speed |
---|---|---|---|---|
HF | 16 | 2,155 | 29GB | 100% |
HF+FA2 | 16 | 2,556 | 28GB | 119% |
Unsloth+FA2 | 16 | 3,400 | 27GB | 158% |
- TGS: tokens per GPU per second
- Model: LLaMA2-7B
- Batch size: 4
- Gradient accumulation: 2
- LoRA rank: 8
- LoRA modules: all
- Max length: 1024
VRAM | 1,024 | 2,048 | 4,096 | 8,192 | 16,384 | 32,768 | 65,536 | 100,000 |
---|---|---|---|---|---|---|---|---|
FlashAttention2 | 6GB | 7GB | 9GB | 12GB | 19GB | 32GB | OOM | OOM |
Unsloth+FA2 | 5GB | 6GB | 7GB | 8GB | 10GB | 16GB | 25GB | 37GB |
TGS | 1,024 | 2,048 | 4,096 | 8,192 | 16,384 | 32,768 | 65,536 | 100,000 |
---|---|---|---|---|---|---|---|---|
FlashAttention2 | 2,295 | 2,741 | 2,926 | 3,128 | 3,542 | 2,216 | OOM | OOM |
Unsloth+FA2 | 2,556 | 3,178 | 3,413 | 3,632 | 4,050 | 2,456 | 1,820 | 1,202 |
Improvement | 111% | 116% | 117% | 116% | 114% | 111% |
- TGS: tokens per GPU per second
- GPU: NVIDIA A100 40GB * 1
- Model: LLaMA2-7B
- Batch size: 1
- Gradient accumulation: 4
- LoRA rank: 8
- LoRA modules: all
- Quantization bit: 4
- Requirements
- Usage
- Guides
- Features