Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. #19758

Merged
merged 4 commits into from
Oct 27, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Oct 26, 2021

Docstring cleanup: Trainer, trainer_template, Callbacks.

  • Cleanup docstrings.
  • Remove type hints in Args list (already in the signature).
  • Remve type hint from "Returns" (already in signature).
  • Add docstrings where missing.
  • Re-sort some class methods (by their importance, private/public, deprecated, etc..).

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Comment on lines 59 to 60
env_index: Obsoleted: The ID of the environment, which the
episode belongs to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is obsolete, we can remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall a very nice cleanup.
some suggestions.
thanks.


All RLlib trainers extend this base class, e.g., the A3CTrainer implements
the A3C algorithm for single and multi-agent training.
Trainers contain a WorkerSet (self.workers), normally used to generate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really concise, but I am not sure users would understand remote and local workers if they start here.
should we say something like:

Trainer contain a WorkSet (self.workers). A WorkerSet is normally composed of a single local worker (self.workers.local_worker(), used to compute and apply learning updates), and multiple remote workers (self.workers.remote_workers(), used to generate environment samples in parallel).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Trainer objects retain internal model state between calls to train(), so
you should create a new trainer instance for each training session.
Each worker (remotes and local) contains a full PolicyMap (1 Policy
for single-agent training, 1 or more policies for multi-agent training).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add one quick sentence about how the policies are synced during training? Like:

Each worker ... for multi-agent traiing). For most algorithems, policies are synced automatically using Ray remote calls during training.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

0 for local only.
@PublicAPI
def train(self) -> ResultDict:
"""Overrides super.train to synchronize global vars."""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm, we only moved things around here right? there is no actual logic changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I wanted to make sure that the order of the methods as they show up in the reference documentation makes more sense -> sorted now by importance. Private methods, helper methods, deprecated methods moved more to the end.

@@ -1284,7 +1096,7 @@ def compute_actions(
Returns:
any: The computed action if full_fetch=False, or
tuple: The full output of policy.compute_actions() if
full_fetch=True or we have an RNN-based Policy.
full_fetch=True or we have an RNN-based Policy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the 4-char indentation, since this line is still about the tuple type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it messes up the sphinx autoclass output.

@@ -1503,7 +1305,7 @@ def export_policy_model(self,
export_dir: str,
policy_id: PolicyID = DEFAULT_POLICY_ID,
onnx: Optional[int] = None):
"""Export policy model with given policy_id to local directory.
"""Exports policy model with given policy_id to a local directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the format of the exported model if onxx param is None?

self.__setstate__(extra_data)

@override(Trainable)
def log_result(self, result: ResultDict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we explain what does log_result do???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea :D
I'll add the docstring
+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually, this overrides (super) Trainable's method, so we shouldn't add a docstring here. I'll add some comments.

"""Pre-evaluation callback."""
pass

@DeveloperAPI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this live at the top right after init() maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

the policy class or None. If None is returned, will use
`default_policy` (which must be provided then).
validate_env: Optional callable to validate the generated environment
(only on worker=0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we expand a little?
I would be wondering what only validate worker=0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, this validation happens on all workers.
The local worker may not even have an env (by default it doesn't if we have >0 remote workers!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see update here?

@sven1977
Copy link
Contributor Author

Hey @gjoliver , thanks for the review. I addressed all requested changes.

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. One last question, but this looks much better.

the policy class or None. If None is returned, will use
`default_policy` (which must be provided then).
validate_env: Optional callable to validate the generated environment
(only on worker=0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see update here?

@sven1977 sven1977 merged commit 80eeb13 into ray-project:master Oct 27, 2021
@sven1977
Copy link
Contributor Author

Oh, no, sorry, forgot to address the validate_env question. Will put this into a follow up PR (many more doc-overhaul ones to come ;) ).

sven1977 added a commit that referenced this pull request Oct 27, 2021
sven1977 added a commit that referenced this pull request Oct 27, 2021
sven1977 added a commit that referenced this pull request Oct 28, 2021
…ainer_template, Callbacks. (#19758)" (#19806)"

This reverts commit 4a82d3e.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants