-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Cleanup examples
folder 04: Curriculum and checkpoint-by-custom-criteria examples moved to new API stack.
#44706
[RLlib] Cleanup examples
folder 04: Curriculum and checkpoint-by-custom-criteria examples moved to new API stack.
#44706
Conversation
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_04
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_04
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Very happy about the curriculum example.
|
||
For debugging, use the following additional command line options | ||
`--no-tune --num-env-runners=0` | ||
which should allow you to set breakpoints anywhere in the RLlib code and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works also with tune, but --local-mode
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! I'm always afraid, we are going to get rid of Ray local-mode at some point. Also, for any number of Learner workers > 0, local mode doesn't work (not sure why, actually).
ckpt = results.get_best_result(metric=policy_loss_key, mode="min").checkpoint | ||
print("Lowest pol-loss: {}".format(ckpt)) | ||
best_result = results.get_best_result(metric=policy_loss_key, mode="min") | ||
ckpt = best_result.checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also ask here for the best checkpoint along the training path best_result.get_best_checkpoint(metric=policy_loss_key, mode="min")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, cool, so ckpt = best_result.checkpoint
returns the very last checkpoint only?
And if the last is not the best one, it's better to do:
best_result.get_best_checkpoint(metric=policy_loss_key, mode="min")
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually doesn't seem to work well with nested keys.
If I do best_result.get_best_checkpoint(policy_loss_key, mode="min")
, I get:
RuntimeError: Invalid metric name ('info', 'learner', 'default_policy', 'learner_stats', 'policy_loss')! You may choose from the following metrics: dict_keys(['custom_metrics', 'episode_media', 'info', 'sampler_results', 'episode_reward_max', 'episode_reward_min', 'episode_reward_mean', 'episode_len_mean', 'episodes_this_iter', 'episodes_timesteps_total', 'policy_reward_min', 'policy_reward_max', 'policy_reward_mean', 'hist_stats', 'sampler_perf', 'num_faulty_episodes', 'connector_metrics', 'num_healthy_workers', 'num_in_flight_async_reqs', 'num_remote_worker_restarts', 'num_agent_steps_sampled', 'num_agent_steps_trained', 'num_env_steps_sampled', 'num_env_steps_trained', 'num_env_steps_sampled_this_iter', 'num_env_steps_trained_this_iter', 'num_env_steps_sampled_throughput_per_sec', 'num_env_steps_trained_throughput_per_sec', 'timesteps_total', 'num_steps_trained_this_iter', 'agent_timesteps_total', 'timers', 'counters', 'done', 'episodes_total', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'config', 'time_since_restore', 'iterations_since_restore', 'perf', 'experiment_tag']).
|
||
ray.shutdown() | ||
best_result = results.get_best_result(metric=vf_loss_key, mode="max") | ||
ckpt = best_result.checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well
param_space=config.to_dict(), | ||
run_config=air.RunConfig(stop=stop, verbose=2), | ||
run_rllib_example_script_experiment( | ||
base_config, args, stop=stop, success_metric={"task_solved": 1.0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice example 👍
…nup_examples_folder_04
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Cleanup
examples
folder 04:Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.