Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try to fix Zero3 Memory Leak following @tohtana idea #363

Closed
wants to merge 9 commits into from

Conversation

dumpmemory
Copy link
Contributor

@dumpmemory dumpmemory commented Apr 24, 2023

here i am following @tohtana 's modification from microsoft/DeepSpeed#3002 to fix #161 . it worked with deepspeed 0.9.1 and torch 2.0. Thanks for @tohtana 's help.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 24, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @dumpmemory, great work getting this issue solved from DeepSpeed and raising the fix here. Could you apply the fix to all places in lora and adalora wherein F.linear is being used . That would solve the issue in all places

@aashay96
Copy link

When will this be deployed?

@dumpmemory
Copy link
Contributor Author

Hello @dumpmemory, great work getting this issue solved from DeepSpeed and raising the fix here. Could you apply the fix to all places in lora and adalora wherein F.linear is being used . That would solve the issue in all places

cool i will. Thanks

@dumpmemory
Copy link
Contributor Author

@pacman100 pls help me to check it. i have made all F.linear replaced.

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dumpmemory for iterating, LGTM! 🤗

Could you run make style and make quality to fix the quality issues?

@dumpmemory
Copy link
Contributor Author

Thank you @dumpmemory for iterating, LGTM! 🤗

Could you run make style and make quality to fix the quality issues?

yes, i will. I will also test new commits from deepspeed sides. Thanks again.

@pacman100
Copy link
Contributor

Hello, @dumpmemory, there are still some code quality issues. Please resolve them to go ahead with the PR

@pacman100
Copy link
Contributor

Hello, is this PR still required? As the DeepSpeed team fixed it in their codebase

@dumpmemory
Copy link
Contributor Author

this pr is no longer required.

@dumpmemory dumpmemory closed this May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPT2 Training GPU Memory Increase with LoRA and Zero 3
4 participants