-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Gymnasium/Gym0.26.x support (new Env.reset()/step()/seed()/render()
APIs).
#28369
Conversation
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
May take a while to get all test cases to pass and merge this. |
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a config flag now called no reset on truncated?
The point of extra truncated signals is so that we can do what rllib does for SAC where we enable no reset on done.
We probably now need to do no reset on truncated.
def reset( | ||
self, | ||
seed: Optional[int] = None, | ||
) -> Tuple[MultiAgentDict, MultiAgentDict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't reset also now support reset kwargs? We would need to add those if that's the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your were right, this was added as options
, just like in gymnasium. It's currently NOT supported for RLlib users (won't break, but users cannot add any content per episode to that kwargs dict).
I wanted to enable this in a follow-up PR.
if not self.initialized: | ||
# TODO(sven): Should we make it possible to pass in a seed here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not necessary. Users should call reset themselves for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But they can't. This logic is normally entirely encapsulated in our RolloutWorker/Sampler/EnvRunner logic, which starts the endless loop right away with a poll()
call (NOT a reset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh gotcha. In that case I'd rather change it over there than over here.
rllib/models/preprocessors.py
Outdated
@@ -71,6 +71,7 @@ def check_shape(self, observation: Any) -> None: | |||
) | |||
try: | |||
if not self._obs_space.contains(observation): | |||
print()#TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch work? Or on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still in progress.
rllib/utils/pre_checks/env.py
Outdated
@@ -209,6 +224,7 @@ def get_type(var): | |||
if not env.observation_space.contains(temp_sampled_next_obs): | |||
raise ValueError(error) | |||
_check_done(done) | |||
_check_done(truncated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add flag to this function to change the error message to be about truncated or done.
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
From the latest gym announcements in case you want to get something working with older envs: "[26] comes with number of breaking changes that in previous versions were turned off. For users wanting to use the new gym version but have old gym environments, we provide the EnvStepCompatibility wrapper and gym.make(..., apply_api_compatibility=True) to using these environments." |
Signed-off-by: sven1977 <[email protected]>
…0_26_support Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/tests/test_multi_agent_env.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…0_26_support Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/evaluation/env_runner_v2.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving, pending a question (likely atari)
@@ -31,7 +31,7 @@ | |||
"To run the application, first install some dependencies.\n", | |||
"\n", | |||
"```bash\n", | |||
"pip install gym[atari]\n", | |||
"pip install gymnasium[atari] gym==0.26.2\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why both?
Signed-off-by: sven1977 <[email protected]>
Thanks for woking on this issue, having RLlib supporting gymnasium is something I have been waiting for. Do we know roughly when you are planning to release version 2.3? |
…PIs). (ray-project#28369) Signed-off-by: tmynn <[email protected]>
@sven1977 I noticed that in ray/rllib/utils/pre_checks/env.py, within the ray/rllib/utils/pre_checks/env.py Line 162 in 2703c56
# Raise warning if using new reset api introduces in gym 0.24
reset_signature = inspect.signature(env.unwrapped.reset).parameters.keys()
if any(k in reset_signature for k in ["seed", "return_info"]):
if log_once("reset_signature"):
logger.warning(
"Your env reset() method appears to take 'seed' or 'return_info'"
" arguments. Note that these are not yet supported in RLlib."
" Seeding will take place using 'env.seed()' and the info dict"
" will not be returned from reset."
) Is this still the case after this PR was merged, ie. will seeding still take place using |
Gymnasium 0.26.3 has been released with major changes in the by-default settings for environments. A custom gym.Env (now: gymnasium.Env) subclass is one of the most important entry points of RLlib users to our library.
To read more about gymnasium, go here: https://github.com/Farama-Foundation/Gymnasium
RLlib should therefore support the new APIs (e.g.
Env.reset()
now returns obs AND infos andEnv.step()
returnsterminated
andtruncated
flags, except for the olddone
one) going forward.Users that are still using the old gym.Env APIs in their classes should either rewrite those classes to abide to the new API or use the provided wrappers (gymnasium.wrappers.EnvCompatibility or
ray.rllib.env.wrappers.multi_agent_env_compatibility.py::MultiAgentEnvCompatibility
)A detailed error message is being provided if users are still on the old gym package or are using old-API gymnasium.Env subclasses.
This PR:
import gym
or related statements byimport gymnasium as gym
.VectorEnv.reset_at()
now returns obs AND infos as well as takes optionalseed
andoptions
arguments.env
setting the new "ALE/" prefix, e.g.config.environment("ALE/Pong-v5", frameskip=1)
for an equivalent toPongNoFrameskip-v4
.done
flag the same as the newterminated
flag (e.g. in loss functions, DONES has been replaced with TERMINATED). The newtruncated
flag is collected as well and available in all train batches (even though, it's mostly ignored). This should improve our loss math as we are currently e.g. setting Q-values to 0.0, even though an episode is only truncated (CartPole-v1 after 500 ts), but not really terminated!horizon
,soft_horizon
andno_done_at_end
have all been deprecated and must no longer be used. Instead, users should implement the proper logic in their environments, using thegymnasium.wrappers.TimeLimit
wrapper, properly returningterminated
vstruncated
(instead of an indiscriminatedone
), and properly picking a new initial state after reset().gym
package, but also gymnasium Envs that still use the old APIs.Env.reset()
method in gymnasium. RLlib RolloutWorkers make sure this is properly implemented (instead of pre-seeding an env via itsseed()
method, which has been deprecated). In future PRs, we will allow users to individually set seeds and options per episode (reset(self, *, seed=.., options=..)
) via callbacks.Major TODOs before this can be merged:
seed()
method anymore).Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.