Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge LoCo with Zero++ #6730

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

XingyuXie
Copy link

Integration of LoCo Method into ZeRO++

Overview

This PR introduces the integration of the LoCo method, as outlined in this paper, into the ZeRO++ framework of DeepSpeed. The key enhancement involves applying error feedback compensation to 4-bit gradients before communication. This approach improves pre-training loss outcomes without additional time overhead, though it requires extra GPU memory. The extent of this memory increase depends on model size and training configuration.

Experimental Results

We conducted pre-training experiments using the Llama2 architecture, adjusting the number of layers and hidden size. The experiments included:

  • A smaller-scale model with 0.8B parameters trained on 30B tokens.
  • A larger-scale model with 8B parameters trained on 5B tokens.

The training data was sampled from Redpajama-V2.

Findings:

  • Smaller Models (0.8B parameters): Significant gains were observed when applying the LoCo method.
  • Larger Models (8B parameters): The gains were present but less pronounced. This could be due to:
    1. Relatively smaller data volume.
    2. Lower pre-training loss for larger models, making significant improvements harder to achieve.

However, even a smaller pre-training loss gap in larger models can translate to meaningful gains in downstream tasks.

Example Script

For reference, the run.sh script used for the 8B parameter, 5B tokens experiment is attached. The experiment was conducted using the DeepSpeed-Megatron platform.

Acknowledgments

Special thanks to cc @GuanhuaWang for ongoing communication and guidance throughout this work.


We appreciate your consideration of this PR and welcome any feedback or questions!

@XingyuXie
Copy link
Author

@XingyuXie please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@loadams loadams requested review from GuanhuaWang and removed request for awan-10 November 12, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant