Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] AlgorithmConfigs: Broad rollout; Example scripts #29700

Merged
merged 52 commits into from
Oct 28, 2022

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Oct 26, 2022

This PR introduces:

  • AlgorithmConfig objects being returned by all built-in RLlib Algorithm.get_default_config() methods.
  • Returning a dict here is still supported and covered by a new backward-compat test case.
  • Adds test cases for different AlgorithmConfig setups and translations.
  • Makes sure a specific algorithm (e.g. PPO) can even be built properly with a generic superclass AlgorithmConfig object (if no PPO-specific settings need to be changed).
  • Starts converting example scripts from old config dicts to using AlgorithmConfig objects.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
# Conflicts:
#	rllib/policy/policy.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@@ -322,8 +322,10 @@ def __init__(
**kwargs: Arguments passed to the Trainable base class.
"""

# Resolve possible dict into an AlgorithmConfig object.
# TODO: In the future, only support AlgorithmConfig objects here.
# Resolve possible dict into an AlgorithmConfig object as well as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the type descriptors in the function signature, we don't accept dicts here anymore!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If we still do, we should send a deprecation warning?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! Good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to support it for a while, but yes, we should warn.

Signed-off-by: sven1977 <[email protected]>
algo = ppo.PPO(config=ppo_config, env=CorrelatedActionsEnv)
# Have to specify this here are we are working with a generic AlgorithmConfig
# object, not a specific one (e.g. PPOConfig).
config.algo_class = args.run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

# Run with tracing enabled for tfe/tf2.
"eager_tracing": args.framework in ["tfe", "tf2"],
.framework(args.framework, eager_tracing=args.framework in ["tfe", "tf2"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can start removing tfe from examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a different PR:
#29755

# Set this to > 1 for multi-GPU learning.
"num_gpus": args.num_gpus,
.environment(
GPURequiringEnv if args.num_gpus_per_worker > 0.0 else "CartPole-v0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we upgrade to v1? if I'm not mistaken, this doesn't exist anymore in recent gym releases`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a separate PR. Feel like this shouldn't be in here. CartPole-v1 might indeed behave slightly differently, so we go t to be careful not to break any tuned examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it just has another reward scale so we need to adjust test that depend on it.

Copy link
Contributor

@maxpumperla maxpumperla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks amazing! (couple of optional ideas/questions to consider)

Signed-off-by: sven1977 <[email protected]>
…_configs_next_steps_2

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	rllib/algorithms/algorithm.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_configs_next_steps_2

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	rllib/examples/action_masking.py
#	rllib/examples/checkpoint_by_custom_criteria.py
#	rllib/examples/custom_logger.py
#	rllib/examples/inference_and_serving/policy_inference_after_training.py
#	rllib/examples/inference_and_serving/policy_inference_after_training_with_attention.py
#	rllib/examples/vizdoom_with_attention_net.py
#	rllib/tests/test_supported_spaces.py
Signed-off-by: sven1977 <[email protected]>
if isinstance(self.algo_class, str):
algo_class = get_algorithm_class(self.algo_class)

return algo_class(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be a good place to always create a deepcopy and freezing it, right?

evaluation_duration_unit="episodes",
)
)
config.simple_optimizer = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set this?

"model": {"custom_model": "eager_model"},
"framework": "tf2",
}
.resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, since we have config objects now, we should default to num_gpus=None and check for RLLIB_NUM_GPUS when freezing the config object / set num_gpus=0 if num_gpus=None. This will make this not very pretty and super redudant line unnecessary.

Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! what a huge PR - and I found nothing that could justify a request of changes! I can approve again when tests are green :)

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 merged commit 5af66e6 into ray-project:master Oct 28, 2022
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
@sven1977 sven1977 deleted the algo_configs_next_steps_2 branch June 2, 2023 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants