-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Templates] Finetuneing template 04 OOMS due to gpu memory leak when using LoRA + V100s #40714
Comments
some related issues found online: |
+1 we are facing this same issue with 4 A10 node (AWS g5.12xlarge ) |
@mak-454 Did you use Ray? Or LoRA + Deepspeed only? |
@woshiyyya am using https://github.com/ray-project/ray/blob/master/doc/source/templates/04_finetuning_llms_with_deepspeed/finetune_hf_llm.py |
@woshiyyya It seems to be going fine and GPU utilization seems to be stable after commenting the line
|
@mak-454 Interesting! Let us repro it on our side and get back to you. |
What happened + What you expected to happen
Fine-tuning template 04 OOMs under specific circumstances.
Start an AWS or GCP V100 node.
Deploy the template.
Run
./run_llama_ft.sh --size=7b --lora
.This will OOM after a while after GRAM fills up. GRAM usage will fill up linearly with steps strongly suggesting a memory leak.
Attempts to reproduce this on AWS's p4des or G5s failed. So this appears to be more or less specific to V100s.
Screenshot shows GRAM (orange) on runs with different context lengths (8, 4, 1 in this order) with LoRA on a V100.
(The final run with batch size 1 grows to approx 100% before crashing).
This is how it looks like on some A100s and should look like (no linear increase, just a flat GRAM curve (orange)).
Versions / Dependencies
master
https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed
Reproduction script
.
Issue Severity
None
The text was updated successfully, but these errors were encountered: