Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Templates] Finetuneing template 04 OOMS due to gpu memory leak when using LoRA + V100s #40714

Closed
ArturNiederfahrenhorst opened this issue Oct 26, 2023 · 6 comments · Fixed by #40940
Assignees
Labels
bug Something that is supposed to be working; but isn't llm P1 Issue that should be fixed within a few weeks

Comments

@ArturNiederfahrenhorst
Copy link
Contributor

ArturNiederfahrenhorst commented Oct 26, 2023

What happened + What you expected to happen

Fine-tuning template 04 OOMs under specific circumstances.

Start an AWS or GCP V100 node.
Deploy the template.
Run ./run_llama_ft.sh --size=7b --lora.

This will OOM after a while after GRAM fills up. GRAM usage will fill up linearly with steps strongly suggesting a memory leak.
Attempts to reproduce this on AWS's p4des or G5s failed. So this appears to be more or less specific to V100s.

Screenshot shows GRAM (orange) on runs with different context lengths (8, 4, 1 in this order) with LoRA on a V100.
(The final run with batch size 1 grows to approx 100% before crashing).
Screenshot 2023-10-25 at 10 36 51

This is how it looks like on some A100s and should look like (no linear increase, just a flat GRAM curve (orange)).
Screenshot 2023-10-26 at 11 16 26

Versions / Dependencies

master
https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed

Reproduction script

.

Issue Severity

None

@ArturNiederfahrenhorst ArturNiederfahrenhorst added bug Something that is supposed to be working; but isn't P1.5 Issues that will be fixed in a couple releases. It will be bumped once all P1s are cleared llm labels Oct 26, 2023
@kouroshHakha kouroshHakha changed the title [Templates] Finetuneing template 04 OOMS when using LoRA + V100s [Templates] Finetuneing template 04 OOMS due to memory leak when using LoRA + V100s Oct 26, 2023
@kouroshHakha kouroshHakha added P1 Issue that should be fixed within a few weeks and removed P1.5 Issues that will be fixed in a couple releases. It will be bumped once all P1s are cleared labels Oct 26, 2023
@kouroshHakha kouroshHakha changed the title [Templates] Finetuneing template 04 OOMS due to memory leak when using LoRA + V100s [Templates] Finetuneing template 04 OOMS due to gpu memory leak when using LoRA + V100s Oct 26, 2023
@kouroshHakha
Copy link
Contributor

some related issues found online:
microsoft/DeepSpeed#3002
microsoft/DeepSpeed#3378

@mak-454
Copy link

mak-454 commented Nov 1, 2023

+1 we are facing this same issue with 4 A10 node (AWS g5.12xlarge )

@woshiyyya
Copy link
Member

@mak-454 Did you use Ray? Or LoRA + Deepspeed only?

@mak-454
Copy link

mak-454 commented Nov 2, 2023

@woshiyyya am using https://github.com/ray-project/ray/blob/master/doc/source/templates/04_finetuning_llms_with_deepspeed/finetune_hf_llm.py
and running it with --lora option.
Few more details
Setup - single node AWS g5.12xlarge. It has 4 a10 nodes.
Batchsize - 4
ND - 4
Block size - 512
Lora config and deepspeed config from the dir - https://github.com/ray-project/ray/blob/master/doc/source/templates/04_finetuning_llms_with_deepspeed

@mak-454
Copy link

mak-454 commented Nov 2, 2023

@woshiyyya It seems to be going fine and GPU utilization seems to be stable after commenting the line

@woshiyyya
Copy link
Member

@mak-454 Interesting! Let us repro it on our side and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't llm P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants