-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] DQN Rainbow Learner with new API stack (and EnvRunner API). #43196
[RLlib] DQN Rainbow Learner with new API stack (and EnvRunner API). #43196
Conversation
…tributional loss and simple Q-loss. Double-Q implemented next. Signed-off-by: Simon Zehnder <[email protected]>
EnvRunner API
EnvRunner API
Signed-off-by: Simon Zehnder <[email protected]>
from ray.rllib.utils.metrics import LAST_TARGET_UPDATE_TS, NUM_TARGET_UPDATES | ||
from ray.rllib.utils.typing import ModuleID | ||
|
||
# Now, this is double defined: In `SACRLModule` and here. I would keep it here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave them here as duplicates for better readability.
Imo, it's really not good to cross-import between different algorithms. Everywhere else, this would be ok to share.
timestep=timestep, | ||
) | ||
|
||
# TODO (Sven): APPO uses `config.target_update_frequency`. Can we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Will revisit once we re-visit APPO/IMPALA on the new stack in a few days.
def _update_module_target_networks( | ||
self, module_id: ModuleID, config: DQNConfig | ||
) -> None: | ||
"""Updates the target Q network(s) of a module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I don't think we need the docstring at all here. Move all the descriptions below into the super (abstract) method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the code consistent let's implement the same design into the module definitions:
- Parent class holds docstrings
- Child class has no docstrings for inherited methods
@@ -182,8 +182,6 @@ def compute_loss_for_module( | |||
if self.config.twin_q: | |||
q_twin_selected = fwd_out[QF_TWIN_PREDS] | |||
|
|||
# TODO (simon): Implement twin Q. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!
@@ -323,6 +319,12 @@ def _update_module_target_networks( | |||
"""Updates the target Q network(s) of a module. | |||
|
|||
Applies Polyak averaging for the update. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, move this description to the super (abstract) method and remove the docstring here entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a few comment-related nits!
Awesome work @simonsays1980 !
EnvRunner API
EnvRunner API
EnvRunner API
…menets. Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…to dqn-rainbow-learner Signed-off-by: Simon Zehnder <[email protected]>
Why are these changes needed?
We are moving DQN Rainbow over to the new stack using
RLModule API
andEnvRunner API
. This PR introduces the learners for DQN Rainbow on the new stack.Related issue number
Closes #37777
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.