-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] New API stack: (Multi)RLModule overhaul vol 05 (deprecate Specs, SpecDict, TensorSpec). #47915
[RLlib] New API stack: (Multi)RLModule overhaul vol 05 (deprecate Specs, SpecDict, TensorSpec). #47915
Conversation
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…odule_do_over_bc_default_module_05_deprecate_specs
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A lot of code will be gone which means: less maintenance costs - this is good! I have some worries in regard to error identifiability. We might want to test this.
|
||
@override(RLModule) | ||
def output_specs_train(self) -> SpecDict: | ||
return [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave these as a kind of comment/docstring in the forward
method? Maybe we do this for all default algos such that we, new RLlib team members and users always know what is returned (received).
@@ -203,15 +116,6 @@ def __init__(self, config: FreeLogStdMLPHeadConfig) -> None: | |||
self.register_buffer("log_std_clip_param_const", self.log_std_clip_param) | |||
|
|||
@override(Model) | |||
def get_input_specs(self) -> Optional[Spec]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us check with tests, if missing a needed input or output is well traceable and the error can be well understood.
…odule_do_over_bc_default_module_05_deprecate_specs
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
…cs, SpecDict, TensorSpec). (ray-project#47915) Signed-off-by: ujjawal-khare <[email protected]>
New API stack: (Multi)RLModule overhaul vol 05 (deprecate Specs, SpecDict, TensorSpec).
Arguments for this move are:
[B, ...]
for non-RNN model components and[B, T, ...]
for RNN-based model components. We don't need specs for this.** All
_forward_inference
AND_forward_exploration
should either return ACTION_DIST_INPUTS and/or ACTIONS keys. The behavior for RLlib in terms of action sampling in the EnvRunners is already well described in the docs.**
_forward_train
must always be properly aligned with the algo's Learner's loss function, so this one remains tricky, regardless of having specs or not. The long-term solution should be what we have already started with the RLModule APIs: The algo's Learner determines, which APIs are hard requirements for any RLModule (custom or RLlib-default) to bring to the table in order to be learnt by the algo, for example: PPO requires all RLModules to implement theValueFunctionAPI
. We should expand this algo-agnostic and spec independent pattern to all the other algos as well (DQN/SAC/DreamerV3, etc..).Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.