[RLlib; Offline RL] Add support to directly read from episodes. #46865

simonsays1980 · 2024-07-30T14:03:14Z

Why are these changes needed?

The Offline RL API in the new stack will soon support recording capabilities (see #46818) that allow direclty storing RLlib's episodes (i.e. SingleAgentEpisodes). In regard to this upcoming change this PR proposes a direct episode readfing option for the OfflinePreLearner that can skip the _map_to_episodes step in the map_batches dta pipeline and therefore increase training speed.

Related issue number

Relates to #46818

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…matted commit merely for securing the work. Signed-off-by: simonsays1980 <[email protected]>

…' and 'MARWILTorchPolicy', fixed imports and tested MARWIL on non-recurrent policies. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…unction. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

… to 'OfflineData'. Set return to reach higher for tuned example. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

… in linting and building. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…nectors request finalized episodes. Signed-off-by: simonsays1980 <[email protected]>

…g as this was giving an error when 'MARWILOfflinePreLearner' tried to call a value function unneeded by BC. Deprecated hybrid stack. Signed-off-by: simonsays1980 <[email protected]>

…tting 'beta=0.0'. Signed-off-by: simonsays1980 <[email protected]>

…. BC depends now fully on MARWIL. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

rllib/algorithms/marwil/tests/test_marwil_old.py

rllib/BUILD

sven1977 · 2024-07-30T15:20:33Z

rllib/algorithms/marwil/marwil_offline_prelearner.py

+        # TODO (simon): episodes are only needed for logging here.
+        return {"batch": [batch]}
+
+    def _compute_gae_from_episodes(


Any chance we can share this code with PPO's?

Create a utility function in PPO folder (e.g. algorithms/ppo/utils.py), then import and use this same function for MARWIL.

That's the question I have: DO we really need the complete code? Or can we get rid of either the batch or episodes in their when considering MARWIL and the offline learning pipeline?

rllib/algorithms/marwil/marwil.py

rllib/algorithms/marwil/marwil_catalog.py

Signed-off-by: simonsays1980 <[email protected]>

…epcrecated. Moved to old stack as it uses policies. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…he learner from MARWIL. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

sven1977

LGTM now! Thanks for this PR @simonsays1980 !

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 added 30 commits September 8, 2023 15:22

Initiated MARWIL RL Module and added catalog, learner and tf_learner.

cbfd05f

Signed-off-by: Simon Zehnder <[email protected]>

Added MARWIL RL Module and started to write test.

c488da7

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into marwil-rl-module

9078af8

Signed-off-by: Simon Zehnder <[email protected]>

Implemented Torch version of MARWIL.

b4e1795

Signed-off-by: Simon Zehnder <[email protected]>

Added torch learner.

5eeb2e6

Signed-off-by: Simon Zehnder <[email protected]>

Merged master.

a1928bc

Signed-off-by: simonsays1980 <[email protected]>

Moved trainign step logic from BC to MARWIL.

3fcef32

Signed-off-by: simonsays1980 <[email protected]>

Setup MARWIL with the new stack using 'OfflineData'. This is an unfor…

e9abc27

…matted commit merely for securing the work. Signed-off-by: simonsays1980 <[email protected]>

Fixed multiple bugs in 'MARWILOfflinePreLearner', 'MARWILTorchLearner…

c930464

…' and 'MARWILTorchPolicy', fixed imports and tested MARWIL on non-recurrent policies. Signed-off-by: simonsays1980 <[email protected]>

LINTER.

8b575db

Signed-off-by: simonsays1980 <[email protected]>

Merged Master

1325beb

Signed-off-by: simonsays1980 <[email protected]>

Removed tensorflow and fixed a small bug.

7785bda

Signed-off-by: simonsays1980 <[email protected]>

Readded 'input_read_schema' b/c it was accidentally removed.

3e680cc

Signed-off-by: simonsays1980 <[email protected]>

Readded further tests for MARWIL on continuous actions and its loss f…

90c8d03

…unction. Signed-off-by: simonsays1980 <[email protected]>

Added default 'prelearner_class' to 'MARWILConfig'.

8a8d5c5

Signed-off-by: simonsays1980 <[email protected]>

Added example to 'tuned_examples' for MARWIL.

5f051e0

Signed-off-by: simonsays1980 <[email protected]>

Added BC and MARWIL tuned_examples to learning tests.

f247e7c

Signed-off-by: simonsays1980 <[email protected]>

Moved definition of prelearner class from 'MARWILConfig.offline_data'…

96d1e8e

… to 'OfflineData'. Set return to reach higher for tuned example. Signed-off-by: simonsays1980 <[email protected]>

Fixed path for data in BUILD.

9a29bf1

Signed-off-by: simonsays1980 <[email protected]>

Removed a duplicated forward slash from BUILD file that led to errors…

0dc7c8a

… in linting and building. Signed-off-by: simonsays1980 <[email protected]>

Merged master.

061804e

Signed-off-by: simonsays1980 <[email protected]>

Added main to MARWIL RLModule test.

ff41d12

Signed-off-by: simonsays1980 <[email protected]>

Added tests for old stack MARWIL.

122dc50

Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into marwil-rl-module

e7b05c7

Fixed a circular import in 'test_offline_data'.

f7f9d89

Signed-off-by: simonsays1980 <[email protected]>

Added 'finalize' to 'OfflinePreLearner._map_to_episodes' b/c some con…

e0a0351

…nectors request finalized episodes. Signed-off-by: simonsays1980 <[email protected]>

Set 'OfflinePreLearner' as BC's default prelearner class in the confi…

5632061

…g as this was giving an error when 'MARWILOfflinePreLearner' tried to call a value function unneeded by BC. Deprecated hybrid stack. Signed-off-by: simonsays1980 <[email protected]>

Fixed a small bug in MARWIL when writing to the metrics logger and se…

9409a1e

…tting 'beta=0.0'. Signed-off-by: simonsays1980 <[email protected]>

Deprecated TensorFlow support for BC and removed bc-specific learners…

8222454

…. BC depends now fully on MARWIL. Signed-off-by: simonsays1980 <[email protected]>

Fixed old stack test by removing hybrid stack.

5afc3ab

Signed-off-by: simonsays1980 <[email protected]>

sven1977 requested a review from ArturNiederfahrenhorst as a code owner July 30, 2024 14:41