-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Cleanup examples folder (vol 30): BC pretraining, then PPO finetuning (new API stack with RLModule checkpoints). #47838
Conversation
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome example. Maybe adding the avg time for reaching 450 points without pretraining to show the advantage of pretraining.
best_result = results.get_best_result(metric_key) | ||
rl_module_checkpoint = ( | ||
Path(best_result.checkpoint.path) | ||
/ COMPONENT_LEARNER_GROUP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!!
| total time (s) | episode_return_mean | num_env_steps_traine | | ||
| | | d_lifetime | | ||
|------------------+------------------------|------------------------| | ||
| 11.4828 | 250.5 | 42394 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome 11 seconds for 250 points. 51 iterations per 11 seconds is pretty fast.
| total time (s) | episode_return_mean | num_episodes_lifetime | | ||
| | | | | ||
+------------------+------------------------+------------------------+ | ||
| 32.7647 | 450.76 | 406 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dom we have by case a number how long it takes to train PPO from zero to 450?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not much slower, if at all. But I think this is not the main point here (we know PPO is super fast learning CartPole). The main goal here is to show that:
a) you can use simple custom Models w/o having to sub-class algo-specific RLModule classes!! <- this is huge and thanks to the new RLModule API concept.
b) it doesn't tank (catastrophic forgetting) after transfer from BC to PPO :)
@@ -857,6 +857,7 @@ def setup(self, config: AlgorithmConfig) -> None: | |||
env_steps_sampled=self.metrics.peek( | |||
NUM_ENV_STEPS_SAMPLED_LIFETIME, default=0 | |||
), | |||
rl_module_state=rl_module_state, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug!
@@ -1362,7 +1362,11 @@ def run_rllib_example_script_experiment( | |||
args.as_test = True | |||
|
|||
# Initialize Ray. | |||
ray.init(num_cpus=args.num_cpus or None, local_mode=args.local_mode) | |||
ray.init( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added reinit error ignore:
In case one calls this utility function twice in an example script.
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
…netuning (new API stack with RLModule checkpoints). (ray-project#47838) Signed-off-by: ujjawal-khare <[email protected]>
Cleanup examples folder (vol 30): BC pretraining, then PPO finetuning (new API stack with RLModule checkpoints).
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.