[RLlib; Offline RL] Implement CQL algorithm logic in new API stack. #47000

simonsays1980 · 2024-08-07T09:01:26Z

Why are these changes needed?

This PR proposes a training logic for CQL in our new API stack using RLModules, Learner API, Offline RL API, and EnvRunner API. More specifically CQL on the new stack uses OfflineData to read and batch training data and RLModules and Learner API to define and train a policy. More specifically, it inherits most of its model logic from SAC as it implements the entropy version of CQL. To add NEXT_OBS to the batch it overrides the AlgorithmConfig's build_learner_connector and proposes therewith a new form how the learner connector should be modified (in contrast to adding more connectors to the learner's pipeline in the learner's build method.

Furthermore, this PR adds a "tuned example" using Pendulum-v1 that shows how CQL can be used on the new API stack.

This PR is a part of a sequence of PRs porposed and coming.

Related issue number

Relates to #46969 and closes #37779

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: simonsays1980 <[email protected]>

…d it. Made some smaller typing changes in learners and added a tuned example for CQL in the new API stack. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…oss and switched in actor loss from selected actions to sampled actions from the current policy. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…ven1977

…ven1977. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

sven1977 · 2024-08-12T10:43:28Z

rllib/BUILD

+    name = "learning_tests_pendulum_cql",
+    main = "tuned_examples/cql/pendulum_cql.py",
+    tags = ["team:rllib", "exclusive", "learning_tests", "torch_only", "learning_tests_cartpole", "learning_tests_discrete", "learning_tests_pytorch_use_all_core"],
+    size = "medium",


size="large" is better, no?

Yeah, maybe. It takes around 50 iterations.

Will tune this in another PR.

sven1977 · 2024-08-12T10:45:28Z

rllib/algorithms/cql/cql_learner.py

+        # Add a metric to keep track of training iterations to
+        # determine when switching the actor loss from behavior
+        # cloning to SAC.
+        self.metrics.log_value(
+            (ALL_MODULES, TRAINING_ITERATION), float("nan"), window=1
+        )


Suggested change

# Add a metric to keep track of training iterations to

# determine when switching the actor loss from behavior

# cloning to SAC.

self.metrics.log_value(

(ALL_MODULES, TRAINING_ITERATION), float("nan"), window=1

)

# Add a metric to keep track of training iterations to

# determine when switching the actor loss from behavior

# cloning to SAC.

self.metrics.log_value(

(ALL_MODULES, TRAINING_ITERATION), float("nan"), window=1

)

This can all go, b/c we now use the default=0 arg.

sven1977 · 2024-08-12T10:47:36Z

rllib/algorithms/cql/cql_learner.py

+        # We need to call the `super()`'s `build` method here to have the variables
+        # for `alpha`` and the target entropy defined.
+        super().build()
+


I feel like it's better to add the NEXT_OBS LearnerConnector piece here, what do you think?

The CQLLearner is the component that has this (hard) requirement (meanin it won't work w/o this connector piece), so it should take care of adding it here.
Similar to how it would check, whether the used RLModule fits a certain API, if required.

I agree that the Learner is the component that needs the connectors. On the other side it uses the algorithm.build_learner_connector to build its connector pipeline. So, I am a bit unsure, where we should generally pack it.

In this specific case I needed to add it in the build_learner_connector b/c the class inheritance was somehow avoiding it otherwise (i.e. the CQLLearner had it, but the TorchCQLLearner did not.

sven1977 · 2024-08-12T10:48:48Z

rllib/algorithms/marwil/marwil.py

@@ -320,7 +320,7 @@ def get_default_policy_class(
    @override(Algorithm)
    def training_step(self) -> ResultDict:
        if self.config.enable_env_runner_and_connector_v2:
-            return self._training_step_new_stack()
+            return self._training_step_new_api_stack()


Thanks for clarifying this! Always better to be expressive :)

sven1977 · 2024-08-12T10:49:34Z

rllib/algorithms/sac/torch/sac_torch_learner.py

@@ -278,7 +278,7 @@ def compute_loss_for_module(

    @override(DQNRainbowTorchLearner)
    def compute_gradients(
-        self, loss_per_module: Dict[str, TensorType], **kwargs
+        self, loss_per_module: Dict[ModuleID, TensorType], **kwargs


sven1977 · 2024-08-12T10:49:39Z

rllib/core/learner/learner.py

@@ -442,7 +442,7 @@ def configure_optimizers_for_module(
    @OverrideToImplementCustomLogic
    @abc.abstractmethod
    def compute_gradients(
-        self, loss_per_module: Dict[str, TensorType], **kwargs
+        self, loss_per_module: Dict[ModuleID, TensorType], **kwargs


sven1977 · 2024-08-12T10:50:09Z

rllib/tuned_examples/cql/pendulum_cql.py

+
+
+stop = {
+    f"{EVALUATION_RESULTS}/{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": -700.0,


If we change the test to "size=large", would the return get a bit higher?

I haven't tried this. Will do.

sven1977 · 2024-08-12T10:51:21Z

rllib/algorithms/cql/cql.py

+
+    @override(AlgorithmConfig)
+    def build_learner_connector(
+        self,


see my comment in CQLLearner. I think we should move this there. Same as for DQN. Or if there are good arguments to leave these in the Config classes, we should also change it in DQN/SAC.

As mentioned above. In rare cases the class inheritance avoids it being changed there and changing it in build_learner_connector would then still work.

You are right that we need to unify the way of adding them in all algorithms (also MARWIL)

It just doesn't feel clean, doing it here. The config classes should contain as little (actually implemented) algo logic as possible and just store settings.

I know I sound like a broken record, but "separation of concerns" principle :):

Whatever an algo-specific Learner must-have, it should create this in its build() method, for example a lr-schedule, or a kl-coeff-schedule, etc..

Whatever an algo-specific Learner must-have, it should check that it has this in its build() method, e.g. "does my RLModule implement API xyz?"

Algo configs should NOT implement any algo logic. If they still do somewhere for some algos, we should add a TODO to move the logic into a more appropriate location.

Suggestion:
Can we try creating a CQLLearner class that inherits from SACLearner and overrides the build() method just to insert that connector piece? Then do class CQLTorchLearner(SACTorchLearner, CQLLearner):. That should work, no?

rllib/algorithms/cql/cql.py

…ct#47121) Following up from ray-project#47082, we actually have 6 different data builds, with this matrix ``` python 3.9 python 3.12 arrow 6 X X arrow 17 X X arrow nightly X X ``` They all share the same build environment (https://github.com/ray-project/ray/blob/master/ci/docker/data.build.Dockerfile), but we have 6 configurations of these build environments given the above matrix This PR updates other flavors to use arrow 17 as well Test: - CI Signed-off-by: can <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…ay-project#47151) Add a comment on worker_context_.GetCurrentTask() that we can't use it in fiber context. Signed-off-by: hongchaodeng <[email protected]>

…ct#46715) ![CleanShot 2024-08-13 at 21 02 00@2x](https://github.com/user-attachments/assets/96d02613-3523-45e2-bf6f-8fbaa2e8ac51) --------- Signed-off-by: Saihajpreet Singh <[email protected]>

…ct#47072) To resolve the issue 40393 and we need to check if size is not None beforehand. Signed-off-by: Xingyu Long <[email protected]>

close ray-project#47102 --------- Signed-off-by: zhilong <[email protected]>

…ort sample. (ray-project#47106)   ## Why are these changes needed? Currently, the sort sample is not in rows and there is a duplicate sort sample progress bar. ![image](https://github.com/user-attachments/assets/30aa9fc3-8e96-473e-a794-da4fc023093a) With this modification, Sort sample will be also in rows and the additional progress bar will be removed. ![image](https://github.com/user-attachments/assets/f0a3e5b6-3f84-4993-9f03-36a350aa47b0) In fact there should only one sort sample progress bar which is created at https://github.com/ray-project/ray/blob/e066289b374464f1e2692382fdea871eb34e3156/python/ray/data/_internal/planner/exchange/sort_task_spec.py#L166 while the one created in ``` sub_progress_bar_names=[ SortTaskSpec.SORT_SAMPLE_SUB_PROGRESS_BAR_NAME, ExchangeTaskSpec.MAP_SUB_PROGRESS_BAR_NAME, ExchangeTaskSpec.REDUCE_SUB_PROGRESS_BAR_NAME, ], ``` should be deleted. ## Related issue number  ## Checks - [√] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [√] I've run `scripts/format.sh` to lint the changes in this PR. - [ √ I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [√] Unit tests - [√] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: zhilong <[email protected]>

## Why are these changes needed? This PR adds the capability to load an Iceberg table into a Ray Dataset. Compared to the PyIceberg functionality, which can only materialize the entire Iceberg table into a single `pyarrow` table first, which is then converted to a Ray dataset, this PR allows a streaming implementation, where the Iceberg table can be distributed into a Ray Dataset. ## Related issue number ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dev <[email protected]> Signed-off-by: dev-goyal <[email protected]> Signed-off-by: Alan Guo <[email protected]> Signed-off-by: tungh2 <[email protected]> Signed-off-by: Jiajun Yao <[email protected]> Signed-off-by: Scott Lee <[email protected]> Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Galen Wang <[email protected]> Signed-off-by: Shilun Fan <[email protected]> Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Matthew Owen <[email protected]> Signed-off-by: cristianjd <[email protected]> Signed-off-by: Justin Yu <[email protected]> Co-authored-by: Fokko Driesprong <[email protected]> Co-authored-by: Sven Mika <[email protected]> Co-authored-by: Alan Guo <[email protected]> Co-authored-by: tungh2 <[email protected]> Co-authored-by: Jiajun Yao <[email protected]> Co-authored-by: Scott Lee <[email protected]> Co-authored-by: Ruiyang Wang <[email protected]> Co-authored-by: Galen Wang <[email protected]> Co-authored-by: Max van Dijck <[email protected]> Co-authored-by: slfan1989 <[email protected]> Co-authored-by: Deepyaman Datta <[email protected]> Co-authored-by: Samuel Chan <[email protected]> Co-authored-by: Matthew Owen <[email protected]> Co-authored-by: cristianjd <[email protected]> Co-authored-by: Justin Yu <[email protected]>

…ect#47086) Add ExportNodeData proto schema which contains a subset of fields from GCSNodeInfo that are used to populate the dashboard APIs (https://docs.google.com/document/d/1qjoF51h2oUN2sr_MtPnovbNFZYZrh3WLNR_P0HrUuOI/edit)

) annotate type in code rather than in comments Signed-off-by: Lonnie Liu <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 added 10 commits August 2, 2024 19:18

Added CQLLearner and CQLTorchLearner.

360bde7

Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'cql-learner' into cql-algorithm

1aeacea

Added basic functionality for CQL in new API stack.

505a011

Signed-off-by: simonsays1980 <[email protected]>

LINTER.

4f02de1

Signed-off-by: simonsays1980 <[email protected]>

Fixed two linter errors.

72acecf

Signed-off-by: simonsays1980 <[email protected]>

Added all functionalities to train CQL on the new API stack and teste…

d32b9a8

…d it. Made some smaller typing changes in learners and added a tuned example for CQL in the new API stack. Signed-off-by: simonsays1980 <[email protected]>

Added tuned example for CQL to BUILD file and added data.

d79a8a7

Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into cql-learner

03345c1

MOdified loss calculation, specifically Q-value sampling in the CQL l…

ec109e1

…oss and switched in actor loss from selected actions to sampled actions from the current policy. Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'cql-learner' into cql-algorithm

e599d5d

sven1977 changed the title ~~[RLlib | Offline RL] - Implement CQL algorithm logic in new API stack.~~ [RLlib | Offline RL] Implement CQL algorithm logic in new API stack. Aug 9, 2024

sven1977 marked this pull request as ready for review August 9, 2024 07:08

sven1977 requested review from sven1977 and ArturNiederfahrenhorst as code owners August 9, 2024 07:08

simonsays1980 added 6 commits August 12, 2024 11:03

Moved optimizing from 'compute_loss_for_module' to 'compute_gradients'.

3d7e353

Signed-off-by: simonsays1980 <[email protected]>

Changed logging training iterations to a simpler logic proposed by @s…

af3ac60

…ven1977. Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into cql-learner

afc7098

Merge branch 'master' into cql-algorithm

f162bbc

Merge branch 'cql-learner' into cql-algorithm

88fe775

Added '__init__.py' to 'cql/torch'.

1f197ce

Signed-off-by: simonsays1980 <[email protected]>

sven1977 changed the title ~~[RLlib | Offline RL] Implement CQL algorithm logic in new API stack.~~ [RLlib; Offline RL] Implement CQL algorithm logic in new API stack. Aug 12, 2024

sven1977 reviewed Aug 12, 2024

View reviewed changes

rllib/algorithms/cql/cql.py Show resolved Hide resolved

can-anyscale and others added 2 commits August 14, 2024 19:28

Fixed a small buck that came with a change in validation.

f900ab3

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 requested review from ericl, scv119, c21, amogkam, scottjlee, bveeramani, raulchen, stephanie-wang, omatthew98 and a team as code owners August 14, 2024 17:28

github-actions bot disabled auto-merge August 14, 2024 17:28

simonsays1980 added 2 commits August 15, 2024 10:39

Merge branch 'master' into cql-algorithm

f119e8c

Merge branch 'master' into cql-algorithm

805ac91

aslonnie removed the request for review from a team August 16, 2024 03:03

hongchaodeng and others added 8 commits August 16, 2024 11:34

[core] make a note GetThreadContext() can't be used in fiber context (r…

ee21446

…ay-project#47151) Add a comment on worker_context_.GetCurrentTask() that we can't use it in fiber context. Signed-off-by: hongchaodeng <[email protected]>

docs: introduce quickstart button for image classification (ray-proje…

3d3bd24

…ct#46715) ![CleanShot 2024-08-13 at 21 02 00@2x](https://github.com/user-attachments/assets/96d02613-3523-45e2-bf6f-8fbaa2e8ac51) --------- Signed-off-by: Saihajpreet Singh <[email protected]>

[Data] Fix validation bug when size=0 in ActorPoolStrategy (ray-proje…

12c413d

…ct#47072) To resolve the issue 40393 and we need to check if size is not None beforehand. Signed-off-by: Xingyu Long <[email protected]>

[Data] Fix exception in async map (ray-project#47110)

750b1ec

close ray-project#47102 --------- Signed-off-by: zhilong <[email protected]>

[autoscaler] fix import not used lint error on typing (ray-project#47163

2bc0fd5

) annotate type in code rather than in comments Signed-off-by: Lonnie Liu <[email protected]>

simonsays1980 requested review from architkulkarni, hongchaodeng and a team as code owners August 16, 2024 09:34

Merge branch 'master' into cql-algorithm

7ff39ad

Signed-off-by: simonsays1980 <[email protected]>

sven1977 merged commit 3a124d3 into ray-project:master Aug 19, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; Offline RL] Implement CQL algorithm logic in new API stack. #47000

[RLlib; Offline RL] Implement CQL algorithm logic in new API stack. #47000

simonsays1980 commented Aug 7, 2024 •

edited

Loading

sven1977 Aug 12, 2024

simonsays1980 Aug 12, 2024

simonsays1980 Aug 12, 2024

sven1977 Aug 12, 2024

sven1977 Aug 12, 2024

sven1977 Aug 12, 2024

simonsays1980 Aug 12, 2024

sven1977 Aug 12, 2024

sven1977 Aug 12, 2024

sven1977 Aug 12, 2024

sven1977 Aug 12, 2024

simonsays1980 Aug 12, 2024

sven1977 Aug 12, 2024

simonsays1980 Aug 12, 2024

sven1977 Aug 13, 2024



		stop = {
		f"{EVALUATION_RESULTS}/{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": -700.0,

[RLlib; Offline RL] Implement CQL algorithm logic in new API stack. #47000

[RLlib; Offline RL] Implement CQL algorithm logic in new API stack. #47000

Conversation

simonsays1980 commented Aug 7, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonsays1980 commented Aug 7, 2024 •

edited

Loading