-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. #19758
[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. #19758
Conversation
rllib/agents/callbacks.py
Outdated
env_index: Obsoleted: The ID of the environment, which the | ||
episode belongs to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is obsolete, we can remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall a very nice cleanup.
some suggestions.
thanks.
rllib/agents/trainer.py
Outdated
|
||
All RLlib trainers extend this base class, e.g., the A3CTrainer implements | ||
the A3C algorithm for single and multi-agent training. | ||
Trainers contain a WorkerSet (self.workers), normally used to generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really concise, but I am not sure users would understand remote and local workers if they start here.
should we say something like:
Trainer contain a WorkSet (self.workers). A WorkerSet is normally composed of a single local worker (self.workers.local_worker(), used to compute and apply learning updates), and multiple remote workers (self.workers.remote_workers(), used to generate environment samples in parallel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rllib/agents/trainer.py
Outdated
Trainer objects retain internal model state between calls to train(), so | ||
you should create a new trainer instance for each training session. | ||
Each worker (remotes and local) contains a full PolicyMap (1 Policy | ||
for single-agent training, 1 or more policies for multi-agent training). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add one quick sentence about how the policies are synced during training? Like:
Each worker ... for multi-agent traiing). For most algorithems, policies are synced automatically using Ray remote calls during training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
0 for local only. | ||
@PublicAPI | ||
def train(self) -> ResultDict: | ||
"""Overrides super.train to synchronize global vars.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, we only moved things around here right? there is no actual logic changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I wanted to make sure that the order of the methods as they show up in the reference documentation makes more sense -> sorted now by importance. Private methods, helper methods, deprecated methods moved more to the end.
@@ -1284,7 +1096,7 @@ def compute_actions( | |||
Returns: | |||
any: The computed action if full_fetch=False, or | |||
tuple: The full output of policy.compute_actions() if | |||
full_fetch=True or we have an RNN-based Policy. | |||
full_fetch=True or we have an RNN-based Policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the 4-char indentation, since this line is still about the tuple type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it messes up the sphinx autoclass output.
@@ -1503,7 +1305,7 @@ def export_policy_model(self, | |||
export_dir: str, | |||
policy_id: PolicyID = DEFAULT_POLICY_ID, | |||
onnx: Optional[int] = None): | |||
"""Export policy model with given policy_id to local directory. | |||
"""Exports policy model with given policy_id to a local directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the format of the exported model if onxx param is None?
rllib/agents/trainer.py
Outdated
self.__setstate__(extra_data) | ||
|
||
@override(Trainable) | ||
def log_result(self, result: ResultDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we explain what does log_result do???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea :D
I'll add the docstring
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, actually, this overrides (super) Trainable's method, so we shouldn't add a docstring here. I'll add some comments.
"""Pre-evaluation callback.""" | ||
pass | ||
|
||
@DeveloperAPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this live at the top right after init() maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
the policy class or None. If None is returned, will use | ||
`default_policy` (which must be provided then). | ||
validate_env: Optional callable to validate the generated environment | ||
(only on worker=0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we expand a little?
I would be wondering what only validate worker=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, this validation happens on all workers.
The local worker may not even have an env (by default it doesn't if we have >0 remote workers!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see update here?
…_docstrings_cleanup_trainer
Hey @gjoliver , thanks for the review. I addressed all requested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. One last question, but this looks much better.
the policy class or None. If None is returned, will use | ||
`default_policy` (which must be provided then). | ||
validate_env: Optional callable to validate the generated environment | ||
(only on worker=0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see update here?
Oh, no, sorry, forgot to address the |
Docstring cleanup: Trainer, trainer_template, Callbacks.
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.