[RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.init when possible #28388

cassidylaidlaw · 2022-09-08T19:44:06Z

Why are these changes needed?

Currently, there are two lines in SampleBatch.__init__ where the Python builtins sum and max are used on seq_lens:

self.max_seq_len = max(seq_lens_)
self.count = sum(self[SampleBatch.SEQ_LENS])

However, if the seq_lens are a PyTorch tensor on the GPU, this is incredibly slow. I believe this is because each element of seq_lens has to be fetched independently from the GPU when iterating over the tensor for sum and max. Thus, I have changed the two lines to use seq_lens_.max().item() and self[SampleBatch.SEQ_LENS].sum().item() for PyTorch tensors.

This significantly speeds up training with models that require state. For instance, consider the following training run:

rllib train --run PPO --env CartPole-v0 \
--config '{"train_batch_size": 2000, "sgd_minibatch_size": 1000, "num_sgd_iter": 100, "framework": "torch", "num_workers": 10, "num_gpus": 1, "model": {"use_lstm": true}, "rollout_fragment_length": 1}' \
--stop '{"training_iteration": 1}'

Before this PR, it takes 22.7 s to run, but after only 13.5 s. If look at result["timers"]["learn_time_ms"], we can also see that it is 8042 before the PR but 3162 after. Thus in some cases we're getting more than a 2x speedup for SGD!

Related issue number

I haven't opened an issue.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ossible Signed-off-by: Cassidy Laidlaw <[email protected]>

stale · 2022-10-29T20:15:38Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

cassidylaidlaw · 2022-11-02T16:29:19Z

It looks like the only tests that are failing in CI are ones that are flaky. Can somebody review this?

Signed-off-by: Cassidy Laidlaw <[email protected]>

cassidylaidlaw · 2022-11-18T21:49:03Z

Updated the PR and tests seem to be passing again. Can anyone look over and/or merge? @sven1977 @gjoliver @avnishn @ArturNiederfahrenhorst @smorad @maxpumperla @kouroshHakha @krfricke

kouroshHakha

sound reasonable to me.

cassidylaidlaw · 2022-11-28T16:48:34Z

Another bump—will this be merged soon?

stale · 2022-12-31T19:36:09Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

cassidylaidlaw · 2023-01-08T00:00:00Z

Is there anything I can do to get this merged? @kouroshHakha

kouroshHakha · 2023-01-08T09:28:16Z

cc @gjoliver @sven1977

gjoliver · 2023-01-12T00:30:33Z

cool. looks like an awesome change for torch.
does this make things slower for TF? just want to have an idea.

…_ when possible (#28388) Signed-off-by: Cassidy Laidlaw <[email protected]>

cassidylaidlaw · 2023-01-22T19:48:51Z

It shouldn't make things any slower in TF. The PyTorch vectorized max/sum are only used if the tensors are from torch, otherwise the code runs the same as before the PR.

Use PyTorch vectorized max() and sum() in SampleBatch.__init__ when p…

1a0739c

…ossible Signed-off-by: Cassidy Laidlaw <[email protected]>

cassidylaidlaw requested review from sven1977, gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners September 8, 2022 19:44

Merge branch 'ray-project:master' into sample_batch_init_pytorch

2fed1d8

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Oct 29, 2022

cassidylaidlaw added 2 commits November 18, 2022 11:41

Merge branch 'master' into sample_batch_init_pytorch

5701a31

Signed-off-by: Cassidy Laidlaw <[email protected]>

Fix formatting

f804275

Signed-off-by: Cassidy Laidlaw <[email protected]>

stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 18, 2022

cassidylaidlaw force-pushed the sample_batch_init_pytorch branch from 2d473bc to f804275 Compare November 18, 2022 20:01

kouroshHakha approved these changes Nov 18, 2022

View reviewed changes

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Dec 31, 2022

kouroshHakha assigned gjoliver Jan 8, 2023

stale bot removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Jan 8, 2023

gjoliver merged commit 2bc8837 into ray-project:master Jan 12, 2023

AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023

[RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.__init_…

2f4b4d0

…_ when possible (#28388) Signed-off-by: Cassidy Laidlaw <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.init when possible #28388

[RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.init when possible #28388

cassidylaidlaw commented Sep 8, 2022 •

edited

Loading

stale bot commented Oct 29, 2022

cassidylaidlaw commented Nov 2, 2022

cassidylaidlaw commented Nov 18, 2022

kouroshHakha left a comment

cassidylaidlaw commented Nov 28, 2022

stale bot commented Dec 31, 2022

cassidylaidlaw commented Jan 8, 2023

kouroshHakha commented Jan 8, 2023 •

edited

Loading

gjoliver commented Jan 12, 2023

cassidylaidlaw commented Jan 22, 2023

[RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.__init__ when possible #28388

[RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.__init__ when possible #28388

Conversation

cassidylaidlaw commented Sep 8, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

stale bot commented Oct 29, 2022

cassidylaidlaw commented Nov 2, 2022

cassidylaidlaw commented Nov 18, 2022

kouroshHakha left a comment

Choose a reason for hiding this comment

cassidylaidlaw commented Nov 28, 2022

stale bot commented Dec 31, 2022

cassidylaidlaw commented Jan 8, 2023

kouroshHakha commented Jan 8, 2023 • edited Loading

gjoliver commented Jan 12, 2023

cassidylaidlaw commented Jan 22, 2023

[RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.init when possible #28388

[RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.init when possible #28388

cassidylaidlaw commented Sep 8, 2022 •

edited

Loading

kouroshHakha commented Jan 8, 2023 •

edited

Loading