-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Offline RL] Offline performance cleanup. #47731
[RLlib; Offline RL] Offline performance cleanup. #47731
Conversation
…'. This was initialized at each iteration and slowed down our 'OfflineData' sampling. Ina ddition tuned all Offline examples for the changes made. Signed-off-by: simonsays1980 <[email protected]>
…e added an option for users to materialize the dataset if needed and enough memory is available. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…ialization and reinitialization of iterators in single-learner mode. Furthermore, changed after-mapping batch size to 1 b/c rows are then 'MultiAgentBatches' of 'train_batch_size_per_learner' environment steps each. In addition added two further options to 'AlgorithmConfig' such that users can control memory usage and performance. Signed-off-by: simonsays1980 <[email protected]>
…alue function output from 'forward_train' and did therefore not train the value function. Signed-off-by: simonsays1980 <[email protected]>
…d and is already converted to a generator we need to rebuild it. Signed-off-by: simonsays1980 <[email protected]>
@@ -204,6 +204,12 @@ def add_rllib_example_script_args( | |||
help="How many (tune.Tuner.fit()) experiments to execute - if possible in " | |||
"parallel.", | |||
) | |||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
# Define the config for Behavior Cloning. | ||
config = ( | ||
BCConfig() | ||
.environment( | ||
env="WrappedALE/Pong-v5", | ||
# TODO (sven): Does this have any influence in connectors? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! You are right and this setting is NOT propagated to the connectors. Not relevant for Pong as its rewards are all 1 anyways, but for other Atari benchmarks, this could matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. There is actually another one:
# TODO (sven): Has this any influence in the connectors?
actions_in_input_normalized=True,
Does this have an influence - or should it? It is not recorgnized, yet, in the offline API.
Co-authored-by: Sven Mika <[email protected]> Signed-off-by: simonsays1980 <[email protected]>
Co-authored-by: Sven Mika <[email protected]> Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved! Thanks @simonsays1980 for this awesome PR. :)
Signed-off-by: simonsays1980 <[email protected]>
…ded anonymous filesystem to RLUnplugged example. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…ation of advantages. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Why are these changes needed?
The
map_batches
call in offline RL learning used to be very slow for unknown reasons. This PR proposes multiple changes to the offline data pipeline to boost performance by several multitudes. These changes arematerialize_data
(default isFalse
) such that users can control memory usage.materialize_mapped_data
(default isTrue
) such htat users can control memory usage. This materialization applies theOfflinePreLearner
on the raw data a priori and can be used by algorithms that do not have connector pipelines (ConnectorV2
pipelines) that need up-to-dateRLModule
and/or states (e.g.BC
orCQL
).map_batches
call because rows contain nowMultiAgentBatch
es withtrain_batch_size_per_learner
environment steps each.In addition, it fixes an important error in
MARWIL
's loss which ignored training the value function.These changes lead to enormous performance boosts:
CartPole-v1
withBC
in single-learner mode below 7 secs (multi-learner mode < 12 secs).CartPole-v1
withMARWIL
in single-learner mode below 50 secs (multi-learner mode < 217 secs)Pendulum-v1" with
CQL` in single-learner mode below 311 secs (multi-learner mode < 116 secs)Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.