Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate horovod in favor of torch.distributed #171

Merged
merged 4 commits into from
Jun 3, 2022

Conversation

vwxyzjn
Copy link
Contributor

@vwxyzjn vwxyzjn commented Jun 3, 2022

Follow up with #165 #158

We did a benchmark with isaacgymenvs and torch.distributed shows consistently better scaling performance in AllegroHand.

Notable change - I disable the stats syncing, which does not seem to be that important to average across all workers at every step.

image

You can test it out with

torchrun --standalone --nnodes=1 --nproc_per_node=2 runner.py --train --file rl_games/configs/ppo_cartpole.yaml

CC @ViktorM @markelsanz14

@Denys88
Copy link
Owner

Denys88 commented Jun 3, 2022

@vwxyzjn Ill take a look.
Strange but kl divergence syncing was needed to set right LR. Or you mean it is enough to calc it on the rank=0 gpu?

But overall looks awesome and much easier to use :) it takes a few hours of pain to install horovod for average person :)

@vwxyzjn
Copy link
Contributor Author

vwxyzjn commented Jun 3, 2022

Thank you @Denys88

Strange but kl divergence syncing was needed to set right LR. Or you mean it is enough to calc it on the rank=0 gpu?

Yes, it should be enough to call it on rank=0 GPU, al least in the case of isaacgymenvs when thousands of envs are available

should_exit_t = torch.tensor(should_exit).float()
self.hvd.broadcast_value(should_exit_t, 'should_exit')
should_exit = should_exit_t.bool().item()
# if self.multi_gpu:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found why I had this code: #95

@ViktorM ViktorM merged commit c9575bd into Denys88:master Jun 3, 2022
@vwxyzjn vwxyzjn mentioned this pull request Jun 19, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants