[RLlib] Gymnasium/Gym0.26.x support (new `Env.reset()/step()/seed()/render()` APIs). #28369

sven1977 · 2022-09-08T06:40:05Z

Gymnasium 0.26.3 has been released with major changes in the by-default settings for environments. A custom gym.Env (now: gymnasium.Env) subclass is one of the most important entry points of RLlib users to our library.

To read more about gymnasium, go here: https://github.com/Farama-Foundation/Gymnasium

RLlib should therefore support the new APIs (e.g. Env.reset() now returns obs AND infos and Env.step() returns terminated and truncated flags, except for the old done one) going forward.

Users that are still using the old gym.Env APIs in their classes should either rewrite those classes to abide to the new API or use the provided wrappers (gymnasium.wrappers.EnvCompatibility or ray.rllib.env.wrappers.multi_agent_env_compatibility.py::MultiAgentEnvCompatibility)

A detailed error message is being provided if users are still on the old gym package or are using old-API gymnasium.Env subclasses.

This PR:

Replaces all import gym or related statements by import gymnasium as gym.
Alters all RLlib env APIs, such as BaseEnv, VectorEnv, MultiAgentEnv, etc.. to fully be compatible with the new gymnasium APIs, meaning e.g. VectorEnv.reset_at() now returns obs AND infos as well as takes optional seed and options arguments.
Addresses the related pettingzoo, minigrid, Atari, etc.. updates as well. For example, all Atari experiments now have as their env setting the new "ALE/" prefix, e.g. config.environment("ALE/Pong-v5", frameskip=1) for an equivalent to PongNoFrameskip-v4.
Reinterprets the old done flag the same as the new terminated flag (e.g. in loss functions, DONES has been replaced with TERMINATED). The new truncated flag is collected as well and available in all train batches (even though, it's mostly ignored). This should improve our loss math as we are currently e.g. setting Q-values to 0.0, even though an episode is only truncated (CartPole-v1 after 500 ts), but not really terminated!
The config settings horizon, soft_horizon and no_done_at_end have all been deprecated and must no longer be used. Instead, users should implement the proper logic in their environments, using the gymnasium.wrappers.TimeLimit wrapper, properly returning terminated vs truncated (instead of an indiscriminate done), and properly picking a new initial state after reset().
A backward compatibility check (checks for RLlib producing the correct error messages) has been added to catch usages of the gym package, but also gymnasium Envs that still use the old APIs.
Seeding is done via the Env.reset() method in gymnasium. RLlib RolloutWorkers make sure this is properly implemented (instead of pre-seeding an env via its seed() method, which has been deprecated). In future PRs, we will allow users to individually set seeds and options per episode (reset(self, *, seed=.., options=..)) via callbacks.
This PR is already massive. No docs changes have been added thus far to not blow things out of proportion. This will be done in follow-up PRs.

Major TODOs before this can be merged:

Fix env runner code: resetting, soft resetting, done -> terminated translation, etc..
seeding should work more transparently (which worker, which sub-env gets which seed?) and per episode (seeding now happens on reset, not in env c'tor or seed() method anymore).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2022-09-08T06:40:55Z

@jsuarez5341 @jkterry1 ^

May take a while to get all test cases to pass and merge this.

Signed-off-by: sven1977 <[email protected]>

…0_26_support

avnishn

Do we need to add a config flag now called no reset on truncated?

The point of extra truncated signals is so that we can do what rllib does for SAC where we enable no reset on done.

We probably now need to do no reset on truncated.

avnishn · 2022-09-08T07:01:36Z

rllib/env/multi_agent_env.py

+    def reset(
+        self,
+        seed: Optional[int] = None,
+    ) -> Tuple[MultiAgentDict, MultiAgentDict]:


Doesn't reset also now support reset kwargs? We would need to add those if that's the case

Your were right, this was added as options, just like in gymnasium. It's currently NOT supported for RLlib users (won't break, but users cannot add any content per episode to that kwargs dict).

I wanted to enable this in a follow-up PR.

avnishn · 2022-09-08T07:04:05Z

rllib/env/multi_agent_env.py

        if not self.initialized:
+            # TODO(sven): Should we make it possible to pass in a seed here?


No, not necessary. Users should call reset themselves for that

But they can't. This logic is normally entirely encapsulated in our RolloutWorker/Sampler/EnvRunner logic, which starts the endless loop right away with a poll() call (NOT a reset).

oh gotcha. In that case I'd rather change it over there than over here.

avnishn · 2022-09-08T07:09:20Z

rllib/models/preprocessors.py

@@ -71,6 +71,7 @@ def check_shape(self, observation: Any) -> None:
                )
            try:
                if not self._obs_space.contains(observation):
+                    print()#TODO


Scratch work? Or on purpose?

still in progress.

avnishn · 2022-09-08T07:10:53Z

rllib/utils/pre_checks/env.py

@@ -209,6 +224,7 @@ def get_type(var):
        if not env.observation_space.contains(temp_sampled_next_obs):
            raise ValueError(error)
    _check_done(done)
+    _check_done(truncated)


Need to add flag to this function to change the error message to be about truncated or done.

Signed-off-by: sven1977 <[email protected]>

…0_26_support

Signed-off-by: sven1977 <[email protected]>

jsuarez5341 · 2022-09-08T15:38:43Z

@jsuarez5341 @jkterry1 ^

May take a while to get all test cases to pass and merge this.

From the latest gym announcements in case you want to get something working with older envs:

"[26] comes with number of breaking changes that in previous versions were turned off. For users wanting to use the new gym version but have old gym environments, we provide the EnvStepCompatibility wrapper and gym.make(..., apply_api_compatibility=True) to using these environments."

Signed-off-by: sven1977 <[email protected]>

…0_26_support Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/tests/test_multi_agent_env.py

Signed-off-by: sven1977 <[email protected]>

…0_26_support Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/evaluation/env_runner_v2.py

Signed-off-by: sven1977 <[email protected]>

…0_26_support

Signed-off-by: sven1977 <[email protected]>

…0_26_support

Signed-off-by: sven1977 <[email protected]>

maxpumperla

approving, pending a question (likely atari)

maxpumperla · 2022-12-19T13:31:29Z

doc/source/ray-core/examples/plot_pong_example.ipynb

@@ -31,7 +31,7 @@
    "To run the application, first install some dependencies.\n",
    "\n",
    "```bash\n",
-    "pip install gym[atari]\n",
+    "pip install gymnasium[atari] gym==0.26.2\n",


…0_26_support

Signed-off-by: sven1977 <[email protected]>

maziarg · 2022-12-21T20:42:22Z

Thanks for woking on this issue, having RLlib supporting gymnasium is something I have been waiting for. Do we know roughly when you are planning to release version 2.3?

…PIs). (#28369)

…PIs). (ray-project#28369) Signed-off-by: tmynn <[email protected]>

afennelly-mitre · 2023-03-07T23:14:58Z

@sven1977 I noticed that in ray/rllib/utils/pre_checks/env.py, within the check_gym_environments() method, there is the following snippet (see [line 162]):(

ray/rllib/utils/pre_checks/env.py

Line 162 in 2703c56

# Raise warning if using new reset api introduces in gym 0.24

)

  # Raise warning if using new reset api introduces in gym 0.24
  reset_signature = inspect.signature(env.unwrapped.reset).parameters.keys()
  if any(k in reset_signature for k in ["seed", "return_info"]):
      if log_once("reset_signature"):
          logger.warning(
              "Your env reset() method appears to take 'seed' or 'return_info'"
              " arguments. Note that these are not yet supported in RLlib."
              " Seeding will take place using 'env.seed()' and the info dict"
              " will not be returned from reset."
          )

Is this still the case after this PR was merged, ie. will seeding still take place using env.seed()? Thank you!

sven1977 added 4 commits September 7, 2022 13:15

wip

dee5dfd

Signed-off-by: sven1977 <[email protected]>

wip

64007e1

Signed-off-by: sven1977 <[email protected]>

wip.

a1f6346

Signed-off-by: sven1977 <[email protected]>

wip.

a9c2d79

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners September 8, 2022 06:40

sven1977 added 2 commits September 8, 2022 09:06

wip.

c4f0ee9

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into gym_…

9ba3a91

…0_26_support

avnishn reviewed Sep 8, 2022

View reviewed changes

sven1977 changed the title ~~[RLlib] Gym 0.26 support.~~ [WIP; RLlib] Gym 0.26 support. Sep 8, 2022

sven1977 added 6 commits September 8, 2022 11:46

wip and LINT

c8db040

Signed-off-by: sven1977 <[email protected]>

wip

e2b1291

Signed-off-by: sven1977 <[email protected]>

wip

6b05a35

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into gym_…

f0a8ca8

…0_26_support

wip

4485bcd

Signed-off-by: sven1977 <[email protected]>

wip

818270e

Signed-off-by: sven1977 <[email protected]>

sven1977 added 7 commits September 8, 2022 18:23

wip

d35acec

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into gym_…

58db319

…0_26_support Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/tests/test_multi_agent_env.py

fix

9b1444f

Signed-off-by: sven1977 <[email protected]>

wip

aee6c50

Signed-off-by: sven1977 <[email protected]>

wip

3bf99bb

Signed-off-by: sven1977 <[email protected]>

wip

0dd65c7

Signed-off-by: sven1977 <[email protected]>

wip

8963620

Signed-off-by: sven1977 <[email protected]>

sven1977 added 18 commits December 17, 2022 20:35

wip

ca584f0

Signed-off-by: sven1977 <[email protected]>

wip

0be4ee5

Signed-off-by: sven1977 <[email protected]>

wip

a89240c

Signed-off-by: sven1977 <[email protected]>

wip

26cacba

Signed-off-by: sven1977 <[email protected]>

wip

d93b7af

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into gym_…

c26f074

…0_26_support Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/evaluation/env_runner_v2.py

wip

3002248

Signed-off-by: sven1977 <[email protected]>

wip

d396f8e

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into gym_…

420bcc1

…0_26_support

wip

f38c267

Signed-off-by: sven1977 <[email protected]>

wip

a491711

Signed-off-by: sven1977 <[email protected]>

wip

f92c824

Signed-off-by: sven1977 <[email protected]>

wip

eac65cc

Signed-off-by: sven1977 <[email protected]>

wip

1eba020

Signed-off-by: sven1977 <[email protected]>

wip

61f736e

Signed-off-by: sven1977 <[email protected]>

wip

eeeac13

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into gym_…

c7ab8ce

…0_26_support

wip

3562b3f

Signed-off-by: sven1977 <[email protected]>

maxpumperla approved these changes Dec 19, 2022

View reviewed changes

sven1977 added 2 commits December 19, 2022 16:56

Merge branch 'master' of https://github.com/ray-project/ray into gym_…

2ad1f05

…0_26_support

wip

c7cff65

Signed-off-by: sven1977 <[email protected]>

sven1977 merged commit 8e680c4 into ray-project:master Dec 20, 2022

AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023

[RLlib] gymnasium support (new Env.reset()/step()/seed()/render() A…

7c0d9c2

…PIs). (#28369)

tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023

[RLlib] gymnasium support (new Env.reset()/step()/seed()/render() A…

40a1928

…PIs). (ray-project#28369) Signed-off-by: tmynn <[email protected]>

sven1977 deleted the gym_0_26_support branch June 2, 2023 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Gymnasium/Gym0.26.x support (new `Env.reset()/step()/seed()/render()` APIs). #28369

[RLlib] Gymnasium/Gym0.26.x support (new `Env.reset()/step()/seed()/render()` APIs). #28369

sven1977 commented Sep 8, 2022 •

edited

Loading

sven1977 commented Sep 8, 2022

avnishn left a comment

avnishn Sep 8, 2022

sven1977 Nov 30, 2022

avnishn Sep 8, 2022

sven1977 Nov 29, 2022

avnishn Dec 5, 2022

avnishn Sep 8, 2022

sven1977 Sep 8, 2022

avnishn Sep 8, 2022

jsuarez5341 commented Sep 8, 2022

maxpumperla left a comment

maxpumperla Dec 19, 2022

maziarg commented Dec 21, 2022

afennelly-mitre commented Mar 7, 2023

		if not self.initialized:
		# TODO(sven): Should we make it possible to pass in a seed here?

[RLlib] Gymnasium/Gym0.26.x support (new Env.reset()/step()/seed()/render() APIs). #28369

[RLlib] Gymnasium/Gym0.26.x support (new Env.reset()/step()/seed()/render() APIs). #28369

Conversation

sven1977 commented Sep 8, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 commented Sep 8, 2022

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsuarez5341 commented Sep 8, 2022

maxpumperla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maziarg commented Dec 21, 2022

afennelly-mitre commented Mar 7, 2023

[RLlib] Gymnasium/Gym0.26.x support (new `Env.reset()/step()/seed()/render()` APIs). #28369

[RLlib] Gymnasium/Gym0.26.x support (new `Env.reset()/step()/seed()/render()` APIs). #28369

sven1977 commented Sep 8, 2022 •

edited

Loading