[RLlib; Offline RL] Implement twin-Q net option for CQL. #47105

simonsays1980 · 2024-08-13T10:33:55Z

Why are these changes needed?

This PR proposes the double Q trick for CQL to stabilize training. More specifically

it modifies the loss computation to also include loss terms for the twin Q-value function
it stores these additional loss terms to run a backward pass with optimizer for the twin Q-value networks

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: simonsays1980 <[email protected]>

sven1977 · 2024-08-13T14:24:35Z

rllib/algorithms/cql/torch/cql_torch_learner.py

@@ -90,8 +92,9 @@ def compute_loss_for_module(
                # Use the actions sampled from the current policy.
                Columns.ACTIONS: actions_curr,
            }
+            # Note, if `twin_q` is `True`, `compute_q_values` computes the minimum


sven1977 · 2024-08-13T14:25:40Z

rllib/algorithms/cql/torch/cql_torch_learner.py

+        if config.twin_q:
+            td_error += torch.abs(q_twin_selected, q_selected_target)
+            # Rescale the TD error
+            td_error += 0.5


Should this be * 0.5?

Great catch :D This should be multiplied by 0.5

sven1977 · 2024-08-13T14:26:48Z

rllib/algorithms/cql/torch/cql_torch_learner.py

@@ -144,15 +150,24 @@ def compute_loss_for_module(
        # Calculate the TD error.
        td_error = torch.abs(q_selected - q_selected_target)
        # TODO (simon): Add the Twin TD error
+        if config.twin_q:
+            td_error += torch.abs(q_twin_selected, q_selected_target)


Should this be torch.minimum?

Sorry, meant: Should this be torch.abs(q_twin_selected - q_selected_target)

sven1977 · 2024-08-13T14:26:55Z

rllib/algorithms/cql/torch/cql_torch_learner.py

@@ -144,15 +150,24 @@ def compute_loss_for_module(
        # Calculate the TD error.
        td_error = torch.abs(q_selected - q_selected_target)
        # TODO (simon): Add the Twin TD error


Remove this TODO.

Another good catch :)

sven1977 · 2024-08-13T14:30:59Z

rllib/algorithms/cql/torch/cql_torch_learner.py

+                * config.min_q_weight
+                * config.temperature
+            )
+            cql_twin_loss - (q_twin_selected.mean()) * config.min_q_weight


Should this be -=?

Could you check all the math also once more? I don't want to miss anything important here :)

Oh yes good catch. I hope I just missed the = key and its not my eyes :)

…the math. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

sven1977

LGTM now. Thanks for this great PR and for double-checking @simonsays1980 .

…#47105)

Implemented twin Q-value networks in 'CQLTorchLearner'.

7d9439c

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 marked this pull request as ready for review August 13, 2024 12:24

simonsays1980 requested review from sven1977 and ArturNiederfahrenhorst as code owners August 13, 2024 12:24

sven1977 changed the title ~~[RLlib; Offline RL] - Implemented double Q in CQL.~~ [RLlib; Offline RL] Implement double-Q option for CQL. Aug 13, 2024

sven1977 changed the title ~~[RLlib; Offline RL] Implement double-Q option for CQL.~~ [RLlib; Offline RL] Implement twin-Q net option for CQL. Aug 13, 2024

sven1977 reviewed Aug 13, 2024

View reviewed changes

simonsays1980 added 3 commits August 13, 2024 17:29

Added suggestions from @sven1977's review and made a second check of …

9e7e52a

…the math. Signed-off-by: simonsays1980 <[email protected]>

Fixed a tiny bug. Thanks @sven1977 for catching this.

1c5282c

Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into offline-cql-add-twin-q

23d4b94

sven1977 approved these changes Aug 13, 2024

View reviewed changes

sven1977 enabled auto-merge (squash) August 13, 2024 16:44

github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 13, 2024

sven1977 merged commit dfc6fad into ray-project:master Aug 13, 2024
7 checks passed

simonsays1980 added a commit to simonsays1980/ray that referenced this pull request Aug 14, 2024

[RLlib; Offline RL] Implement twin-Q net option for CQL. (ray-project…

38d8178

…#47105)

simonsays1980 added a commit to simonsays1980/ray that referenced this pull request Aug 15, 2024

[RLlib; Offline RL] Implement twin-Q net option for CQL. (ray-project…

d0679b7

…#47105)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; Offline RL] Implement twin-Q net option for CQL. #47105

[RLlib; Offline RL] Implement twin-Q net option for CQL. #47105

simonsays1980 commented Aug 13, 2024 •

edited

Loading

sven1977 Aug 13, 2024

sven1977 Aug 13, 2024

simonsays1980 Aug 13, 2024

sven1977 Aug 13, 2024

sven1977 Aug 13, 2024

sven1977 Aug 13, 2024

simonsays1980 Aug 13, 2024

sven1977 Aug 13, 2024

simonsays1980 Aug 13, 2024

sven1977 left a comment

[RLlib; Offline RL] Implement twin-Q net option for CQL. #47105

[RLlib; Offline RL] Implement twin-Q net option for CQL. #47105

Conversation

simonsays1980 commented Aug 13, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

simonsays1980 commented Aug 13, 2024 •

edited

Loading