-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 #21652
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 #21652
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. LMK what you think :)
rllib/agents/ars/ars.py
Outdated
self.validate_config(config) | ||
|
||
# Generate `self.env_creator` callable to create an env instance. | ||
self._get_env_creator_from_env_id(self._env_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for whatever reason this function isn't available on master, but I also didn't find its definition in this pr diff, but only when checking out this branch. Strange :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a mistake by me when splitting my local branch (which contained more changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
rllib/agents/es/es.py
Outdated
self.validate_config(config) | ||
|
||
# Generate `self.env_creator` callable to create an env instance. | ||
self._get_env_creator_from_env_id(self._env_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we change the name of this function to _set_env_creator_from_env_id
. This function sets the self.env_creator
variable, but it doesn't return anything. That, or we could return the env creator ourselves and set the attribute ourselves:
self.env_creator = self._get_env_creator_from_env_id(self._env_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the catch. I'll check. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…ntralized_multi_agent_learning_03
…ntralized_multi_agent_learning_03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice cleanup PR.
a couple of minor questions.
on a high level, maybe break this into multiple PRs in the future, so simple changes like config param renaming can be merged much faster.
thanks.
policy_mapping_fn=policy_mapping_fn, | ||
policies_to_train=policies_to_train, | ||
) | ||
worker.add_policy(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that this is a 1-line statement, why bother with the inline fn?
maybe cleaner to just call worker.add_policy(**kwargs) directly below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to pass a function/callable to foreach_worker()
below anyways. So it's better to share that code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. get it.
if workers: | ||
workers.stop() | ||
# Stop all optimizers. | ||
if hasattr(self, "optimizer") and self.optimizer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why get rid of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemed really outdated code.
Trainers do not have self.optimizer
(anymore), only ES and ARS and those two optimizers do NOT have a stop()
method, so this would actually produce errors here.
If users want their Trainers to have a self.optimizer
, they can just override Trainer.cleanup()
and implement the necessary logic.
…ntralized_multi_agent_learning_03
…ntralized_multi_agent_learning_03
Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #3
min_time_s_per_reporting
instead ofmin_iter_time_s
(deprecated previously).Trainer.setup()
instead ofTrainer._init()
(deprecated previously) and renameself._workers
intoself.workers
to match all other algos in RLlib.Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.