-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] AlgorithmConfigs: Broad rollout; Example scripts #29700
[RLlib] AlgorithmConfigs: Broad rollout; Example scripts #29700
Conversation
Signed-off-by: sven1977 <[email protected]>
# Conflicts: # rllib/policy/policy.py
Signed-off-by: sven1977 <[email protected]>
…_configs_next_steps_1
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_configs_next_steps_1
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@@ -322,8 +322,10 @@ def __init__( | |||
**kwargs: Arguments passed to the Trainable base class. | |||
""" | |||
|
|||
# Resolve possible dict into an AlgorithmConfig object. | |||
# TODO: In the future, only support AlgorithmConfig objects here. | |||
# Resolve possible dict into an AlgorithmConfig object as well as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the type descriptors in the function signature, we don't accept dicts here anymore!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If we still do, we should send a deprecation warning?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! Good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want to support it for a while, but yes, we should warn.
Signed-off-by: sven1977 <[email protected]>
algo = ppo.PPO(config=ppo_config, env=CorrelatedActionsEnv) | ||
# Have to specify this here are we are working with a generic AlgorithmConfig | ||
# object, not a specific one (e.g. PPOConfig). | ||
config.algo_class = args.run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
rllib/examples/custom_logger.py
Outdated
# Run with tracing enabled for tfe/tf2. | ||
"eager_tracing": args.framework in ["tfe", "tf2"], | ||
.framework(args.framework, eager_tracing=args.framework in ["tfe", "tf2"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can start removing tfe
from examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a different PR:
#29755
rllib/examples/fractional_gpus.py
Outdated
# Set this to > 1 for multi-GPU learning. | ||
"num_gpus": args.num_gpus, | ||
.environment( | ||
GPURequiringEnv if args.num_gpus_per_worker > 0.0 else "CartPole-v0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we upgrade to v1
? if I'm not mistaken, this doesn't exist anymore in recent gym
releases`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a separate PR. Feel like this shouldn't be in here. CartPole-v1 might indeed behave slightly differently, so we go t to be careful not to break any tuned examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it just has another reward scale so we need to adjust test that depend on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks amazing! (couple of optional ideas/questions to consider)
Signed-off-by: sven1977 <[email protected]>
…_configs_next_steps_2 Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/algorithms/algorithm.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_configs_next_steps_2 Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/examples/action_masking.py # rllib/examples/checkpoint_by_custom_criteria.py # rllib/examples/custom_logger.py # rllib/examples/inference_and_serving/policy_inference_after_training.py # rllib/examples/inference_and_serving/policy_inference_after_training_with_attention.py # rllib/examples/vizdoom_with_attention_net.py # rllib/tests/test_supported_spaces.py
Signed-off-by: sven1977 <[email protected]>
if isinstance(self.algo_class, str): | ||
algo_class = get_algorithm_class(self.algo_class) | ||
|
||
return algo_class( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it would be a good place to always create a deepcopy and freezing it, right?
evaluation_duration_unit="episodes", | ||
) | ||
) | ||
config.simple_optimizer = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set this?
"model": {"custom_model": "eager_model"}, | ||
"framework": "tf2", | ||
} | ||
.resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, since we have config objects now, we should default to num_gpus=None and check for RLLIB_NUM_GPUS when freezing the config object / set num_gpus=0 if num_gpus=None. This will make this not very pretty and super redudant line unnecessary.
…_configs_next_steps_2
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! what a huge PR - and I found nothing that could justify a request of changes! I can approve again when tests are green :)
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…t#29700) Signed-off-by: Weichen Xu <[email protected]>
This PR introduces:
Algorithm.get_default_config()
methods.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.