-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing return_info argument to env.reset() and deprecated env.seed() function (reset now always returns info) #2962
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mostly looks good, I just have a couple of questions on mainly on the environment checker and passive env checker
@@ -482,11 +446,8 @@ def observation(self, obs): | |||
|
|||
def reset(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we change this to the actual reset parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not necessarily against this, but some of our wrappers use this strategy to allow the wrapper to be agnostic to changes in the function definitions, such as https://github.com/openai/gym/blob/master/gym/wrappers/order_enforcing.py. Is this suggestion part of a broader goal of moving towards explicit argument type hinting for wrappers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good. Lets not do it because it allows the wrappers to be partially backward compatible
@@ -281,7 +234,6 @@ def check_env(env: gym.Env, warn: bool = None, skip_render_check: bool = False): | |||
# ==== Check the reset method ==== | |||
check_reset_seed(env) | |||
check_reset_options(env) | |||
check_reset_info(env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace this with a signature check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, but I also brought back check_reset_info. Open to feedback if the current tests seem redundant since there is some redundancy between test_reset_info and test_passive_env_reset_checker
@@ -126,53 +126,6 @@ def check_reset_seed(env: gym.Env): | |||
) | |||
|
|||
|
|||
def check_reset_info(env: gym.Env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to keep these with a signature check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean if "return_info" is in the signature, check to see if the return_info argument is handled correctly? So it would be for legacy third part envs implementing return_info in their signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More like, if return info is in the signature then say that this feature is deprecate and the environment should return the obs and info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, I added a test for the signature check as well
@@ -250,15 +218,13 @@ def reset_wait( | |||
self, | |||
timeout: Optional[Union[int, float]] = None, | |||
seed: Optional[int] = None, | |||
return_info: bool = False, | |||
options: Optional[dict] = None, | |||
) -> Union[ObsType, Tuple[ObsType, List[dict]]]: | |||
"""Waits for the calls triggered by :meth:`reset_async` to finish and returns the results. | |||
|
|||
Args: | |||
timeout: Number of seconds before the call to `reset_wait` times out. If `None`, the call to `reset_wait` never times out. | |||
seed: ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The seed is ignored, how to vector env seed their environments without seed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I looked into this, basically reset_wait doesn't use its arguments, since seeding is performed in reset_async, so I think we can safely remove the arguments in reset_wait for vector envs. I think this should be fixed in a separate PR though.
observation, info = env.reset(**kwargs) | ||
observations.append(observation) | ||
infos = self._add_info(infos, info, i) | ||
observation, info = env.reset(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use kwargs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to hold off on this unless specifically requested, in accordance with our discussion about kwargs on the other comment.
logger.warn( | ||
"Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information." | ||
) | ||
|
||
# Checks the result of env.reset with kwargs | ||
result = env.reset(**kwargs) | ||
if kwargs.get("return_info", False) is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would like the passive environment checker to have as minimal errors as possible. Therefore, keep some of these checks as warnings that the results is tuple and length == 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the tuple check into a warning.
@pseudo-rnd-thoughts Alright so I had a few questions about some of the comments so I left my own comments and I made changes to address the other ones and resolved them. Tests are passing locally so I don't know what the deal with CI/CD is. I think we should remove RandomNumberGenerator in a different PR, since it breaks space seeding tests when done naively. |
…ion assertion if found in reset signature
Alright passing tests and ready for final review. Definitely we need to be as careful as possible about this PR. It's going to lead to a lot of issues being created since it's a hard deprecation of a core feature. If anyone has ideas for where to add more deprecation warnings, I think that could make this easier on users. |
It occurs to me this will need a counterpart PR in the docs as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more comments for the changes to the environment checker
@@ -73,7 +73,7 @@ def check_reset_seed(env: gym.Env): | |||
and signature.parameters["kwargs"].kind is inspect.Parameter.VAR_KEYWORD | |||
): | |||
try: | |||
obs_1 = env.reset(seed=123) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the first check in the environment checker, we should probably check that the output is a tuple with length 2 so users don't get a weird an explained error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now I call check_reset_return_type
before calling check_reset_seed
, in check_env
, so this should address that possibility.
gym/utils/passive_env_checker.py
Outdated
) | ||
|
||
if "options" not in signature.parameters and "kwargs" not in signature.parameters: | ||
if "options" not in signature.parameters or "kwargs" in signature.parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have we lost the not
for the kwargs check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually an automatic merge error since my original fork for this commit is sort of old. Good catch. I'm reverting it.
gym/utils/env_checker.py
Outdated
) | ||
|
||
|
||
def check_reset_options(env: gym.Env): | ||
"""Check that the environment can be reset with options. | ||
def check_reset_info(env: gym.Env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure this is needed and could probably be easily covered by the other functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I split check_reset_info out into check_reset_return_type
and check_reset_return_info_deprecation
. Some behavior is redundant with the call to env_reset_passive_checker
, so we may want to stop calling the env_reset_passive_checker
in the standard env_checker
in a later PR. I think the other option would be to instrument env_reset_passive_checker
with an argument to put it into "assert mode," but I feel like that's out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea for the changes, though Im not a fan of the assert mode as it feels like it could a lot more if statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, do we want to add a check in the environment checker for seed
. I don't think it should raise an error, just raise a warning
gym/utils/env_checker.py
Outdated
UserWarning | ||
""" | ||
seed_fn = getattr(env, "seed", None) | ||
print(seed_fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove prints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests/utils/test_env_checker.py
Outdated
assert callable(env.seed) | ||
check_seed_deprecation(env) | ||
|
||
with warnings.catch_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check this works as intended we might want to add
with warnings.catch_warnings(record=True) as caught_warnings:
<test code>
assert len(caught_warnings) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I changed to this pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the print statement, LGTM
gym/utils/env_checker.py
Outdated
raise gym.error.Error( | ||
"The `reset` method does not provide an `options` or `**kwargs` keyword argument." | ||
if "return_info" in signature.parameters: | ||
raise AssertionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to make this a warning - it is an important information, but at the same time, in principle, it's not incorrect to implement a custom environment which takes extra optional arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'm changing this to a UserWarning.
@@ -164,8 +136,8 @@ def step_wait(self): | |||
info, | |||
) = step_api_compatibility(env.step(action), True) | |||
if self._terminateds[i] or self._truncateds[i]: | |||
observation, info = env.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a mistake? In the old version, reset is after updating the info dict, now it's before, so I think it will record an incorrect final_observation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems to be either a mistake or a merge error, so I'm reverting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a mistake rather than a merge error, and I also found the mistake in the async vector env. Tests break when I switch the order of these statements to the correct order, so I need to examine why the tests are wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this and now tests are passing. One issues with a naive swap of the order is that calling obs, info = env,reset()
overwrites the info
object where "final_observation" was saved. So I used a temp variable to write the observation from the last call to .step()
into the info
returned by the call to .reset()
tests/utils/test_env_checker.py
Outdated
|
||
|
||
def _reset_return_info_type(self, seed=None, return_info=False, options=None): | ||
def _deprecated_return_info(self, return_info=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(some comments/description, and type hints)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is we want to throw a warning if return_info
is in the signature of an environment's reset function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, but these specific functions are extremely odd. Please at least annotate these tests so that it's clear to us 6 months from now what these functions are meant to do exactly, or perhaps restructure the tests so that it's more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I missed your second comment, yeah I can add some more explanation of the specific reason for the deprecation test, as well as type hints.
tests/utils/test_env_checker.py
Outdated
else: | ||
return self.observation_space.sample() | ||
def _reset_return_info_type(self, seed=None, options=None): | ||
return [1, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like that this is a random list of integers. Is this meant to emulate a reset output of an incorrectly implemented env? We need some commentary about this specific sequence of functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test can be updated to
return [self.observation_space.sample(), {}]
I think this makes more sense
The point of the function is as a reset function where the type of the return is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Are we sure about this breaking change? Is it a part of 0.26.0 or v1.0? If it is 0.26.0, I don't understand why we have merged this so early; in my opinion this should not be the new default yet because it will break most of the existing code. It'd be nice if such a breaking change can be introduced ONLY with MAJOR versions. My understanding is that the next release we're expecting is v0.26: #3056 |
Through gym v0.25, it was NEVER warned to users that It would be REALLY surprising if v0.26 suddenly changes the behavior and there is no way back. This doesn't make much sense, and I strongly suggest the new defaults should be a part of v1.0, and hopefully after some deprecation period. |
This has been modified to a full compatibility wrapper in #3066 (EnvCompatibility) |
Description
This PR removes the
return_info
argument soenv.reset
always returns a tuple of the form(obs, info)
whereinfo
is a dict which contains observation metadata.Additionally, this PR removes the
env.seed()
function, for both the core env and vector envs.Type of change
Please delete options that are not relevant.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)