Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ds_config scheuduler params to prevent deepseed from creating scheduler for ref_model #2224

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Ben-Schneider-code
Copy link
Contributor

@Ben-Schneider-code Ben-Schneider-code commented Oct 11, 2024

What does this PR do?

Fixes #2154

My opinion is that popping the scheduler params off the deepcopied ds_config is the best way to solve this. This prevents deepspeed from attempting to build a scheduler for the engine (which is fine because any returned scheduler is ignored anyways). I explored some other avenues like using the Deepspeed inference engine instead but the different config would make it a nightmare.

Other:

  • Removed redundant None check, as ref_model is None checked before _prepare_deepspeed is called:
if self.ref_model is None:
    if not (self.is_peft_model or self.precompute_ref_log_probs):
        raise ValueError(
            "No reference model and model is not a Peft model. Try setting `precompute_ref_log_probs=True`"
        )
    if args.sync_ref_model:
        raise ValueError(
            "You currently cannot use `ref_model=None` with TR-DPO method. Please provide `ref_model`."
        )
else:
    if self.is_deepspeed_enabled:
        self.ref_model = self._prepare_deepspeed(self.ref_model)
    else:
        self.ref_model = self.accelerator.prepare_model(self.ref_model, evaluation_mode=True)

also even if _prepare_deepspeed ends up getting used elsewhere passing model=None to deepspeed will hit an assert anyways.

  • Added a comment to clarfify that _prepare_deepspeed can only be used on reference models that won't be trained
  • Fixed typo

@qgallouedec sorry for the delay on such a simple fix.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines.
  • Did you write any new necessary tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling of "auto" in deepspeed config causes crash under Zero3
1 participant