[RLlib; Offline RL] Offline performance cleanup. #47731

simonsays1980 · 2024-09-18T14:36:44Z

Why are these changes needed?

The map_batches call in offline RL learning used to be very slow for unknown reasons. This PR proposes multiple changes to the offline data pipeline to boost performance by several multitudes. These changes are

Materialization of raw data in memory if resources are available with the option materialize_data (default is False) such that users can control memory usage.
Materialization of mapped data in memory if resources are available with the option materialize_mapped_data (default is True) such htat users can control memory usage. This materialization applies the OfflinePreLearner on the raw data a priori and can be used by algorithms that do not have connector pipelines (ConnectorV2 pipelines) that need up-to-date RLModule and/or states (e.g. BC or CQL).
An iterator that is instantiated once and is reinitiated whenever exhausted for the single-learner case (in the multi-learner case iterators are built on the remote learners anyways)
A batch size of 1 after the map_batches call because rows contain now MultiAgentBatches with train_batch_size_per_learner environment steps each.

In addition, it fixes an important error in MARWIL's loss which ignored training the value function.

These changes lead to enormous performance boosts:

Learning CartPole-v1 with BC in single-learner mode below 7 secs (multi-learner mode < 12 secs).
Learning CartPole-v1 with MARWIL in single-learner mode below 50 secs (multi-learner mode < 217 secs)
Learning Pendulum-v1" with CQL` in single-learner mode below 311 secs (multi-learner mode < 116 secs)

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…'. This was initialized at each iteration and slowed down our 'OfflineData' sampling. Ina ddition tuned all Offline examples for the changes made. Signed-off-by: simonsays1980 <[email protected]>

…e added an option for users to materialize the dataset if needed and enough memory is available. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…ialization and reinitialization of iterators in single-learner mode. Furthermore, changed after-mapping batch size to 1 b/c rows are then 'MultiAgentBatches' of 'train_batch_size_per_learner' environment steps each. In addition added two further options to 'AlgorithmConfig' such that users can control memory usage and performance. Signed-off-by: simonsays1980 <[email protected]>

…alue function output from 'forward_train' and did therefore not train the value function. Signed-off-by: simonsays1980 <[email protected]>

…d and is already converted to a generator we need to rebuild it. Signed-off-by: simonsays1980 <[email protected]>

rllib/core/learner/learner.py

rllib/algorithms/algorithm_config.py

rllib/offline/offline_data.py

sven1977 · 2024-09-19T11:15:28Z

rllib/utils/test_utils.py

@@ -204,6 +204,12 @@ def add_rllib_example_script_args(
        help="How many (tune.Tuner.fit()) experiments to execute - if possible in "
        "parallel.",
    )
+    parser.add_argument(


rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

sven1977 · 2024-09-19T11:17:22Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py


 # Define the config for Behavior Cloning.
 config = (
    BCConfig()
    .environment(
        env="WrappedALE/Pong-v5",
+        # TODO (sven): Does this have any influence in connectors?


Great point! You are right and this setting is NOT propagated to the connectors. Not relevant for Pong as its rewards are all 1 anyways, but for other Atari benchmarks, this could matter.

Thanks for the clarification. There is actually another one:

# TODO (sven): Has this any influence in the connectors? actions_in_input_normalized=True,

Does this have an influence - or should it? It is not recorgnized, yet, in the offline API.

rllib/tuned_examples/cql/pendulum_cql.py

rllib/tuned_examples/marwil/cartpole_marwil.py

Co-authored-by: Sven Mika <[email protected]> Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

sven1977

Approved! Thanks @simonsays1980 for this awesome PR. :)

Signed-off-by: simonsays1980 <[email protected]>

…ded anonymous filesystem to RLUnplugged example. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…ation of advantages. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: ujjawal-khare <[email protected]>

simonsays1980 added 9 commits September 10, 2024 15:59

Added reinitialization of the 'batch_iterator' to 'OfflineData.sample…

eb61c8f

…'. This was initialized at each iteration and slowed down our 'OfflineData' sampling. Ina ddition tuned all Offline examples for the changes made. Signed-off-by: simonsays1980 <[email protected]>

Added logic to materialize the data after mapping batches. Furthermor…

a48be61

…e added an option for users to materialize the dataset if needed and enough memory is available. Signed-off-by: simonsays1980 <[email protected]>

Intermediate results.

62d8819

Signed-off-by: simonsays1980 <[email protected]>

Intermediate results.

531b526

Signed-off-by: simonsays1980 <[email protected]>

Merged master.

0f36c96

Signed-off-by: simonsays1980 <[email protected]>

Fixed an error in MARWIL's loss function that ignored basically the v…

b83dd0b

…alue function output from 'forward_train' and did therefore not train the value function. Signed-off-by: simonsays1980 <[email protected]>

Fixed a test that was failing. When an iterator should be returned an…

4eae101

…d and is already converted to a generator we need to rebuild it. Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into offline-performance-cleanup

e3a0224

simonsays1980 marked this pull request as ready for review September 19, 2024 09:56

simonsays1980 requested review from sven1977 and ArturNiederfahrenhorst as code owners September 19, 2024 09:56