Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing return_info argument to env.reset() and deprecated env.seed() function (reset now always returns info) #2962

Merged
merged 19 commits into from
Aug 23, 2022

Conversation

balisujohn
Copy link
Contributor

@balisujohn balisujohn commented Jul 13, 2022

Description

This PR removes the return_info argument so env.reset always returns a tuple of the form
(obs, info)where info is a dict which contains observation metadata.

Additionally, this PR removes the env.seed() function, for both the core env and vector envs.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@balisujohn balisujohn changed the title Removing deprecated return_info argument to env.reset and deprecated env.seed() function Removing return_info argument to env.reset and deprecated env.seed() function Jul 13, 2022
@balisujohn balisujohn changed the title Removing return_info argument to env.reset and deprecated env.seed() function Removing return_info argument to env.reset() and deprecated env.seed() function Jul 13, 2022
Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mostly looks good, I just have a couple of questions on mainly on the environment checker and passive env checker

gym/core.py Outdated Show resolved Hide resolved
@@ -482,11 +446,8 @@ def observation(self, obs):

def reset(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we change this to the actual reset parameters?

Copy link
Contributor Author

@balisujohn balisujohn Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily against this, but some of our wrappers use this strategy to allow the wrapper to be agnostic to changes in the function definitions, such as https://github.com/openai/gym/blob/master/gym/wrappers/order_enforcing.py. Is this suggestion part of a broader goal of moving towards explicit argument type hinting for wrappers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good. Lets not do it because it allows the wrappers to be partially backward compatible

@@ -281,7 +234,6 @@ def check_env(env: gym.Env, warn: bool = None, skip_render_check: bool = False):
# ==== Check the reset method ====
check_reset_seed(env)
check_reset_options(env)
check_reset_info(env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace this with a signature check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, but I also brought back check_reset_info. Open to feedback if the current tests seem redundant since there is some redundancy between test_reset_info and test_passive_env_reset_checker

@@ -126,53 +126,6 @@ def check_reset_seed(env: gym.Env):
)


def check_reset_info(env: gym.Env):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to keep these with a signature check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean if "return_info" is in the signature, check to see if the return_info argument is handled correctly? So it would be for legacy third part envs implementing return_info in their signature?

Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like, if return info is in the signature then say that this feature is deprecate and the environment should return the obs and info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I added a test for the signature check as well

@@ -250,15 +218,13 @@ def reset_wait(
self,
timeout: Optional[Union[int, float]] = None,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
) -> Union[ObsType, Tuple[ObsType, List[dict]]]:
"""Waits for the calls triggered by :meth:`reset_async` to finish and returns the results.

Args:
timeout: Number of seconds before the call to `reset_wait` times out. If `None`, the call to `reset_wait` never times out.
seed: ignored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seed is ignored, how to vector env seed their environments without seed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked into this, basically reset_wait doesn't use its arguments, since seeding is performed in reset_async, so I think we can safely remove the arguments in reset_wait for vector envs. I think this should be fixed in a separate PR though.

observation, info = env.reset(**kwargs)
observations.append(observation)
infos = self._add_info(infos, info, i)
observation, info = env.reset(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use kwargs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to hold off on this unless specifically requested, in accordance with our discussion about kwargs on the other comment.

gym/utils/passive_env_checker.py Outdated Show resolved Hide resolved
gym/utils/passive_env_checker.py Outdated Show resolved Hide resolved
logger.warn(
"Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information."
)

# Checks the result of env.reset with kwargs
result = env.reset(**kwargs)
if kwargs.get("return_info", False) is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would like the passive environment checker to have as minimal errors as possible. Therefore, keep some of these checks as warnings that the results is tuple and length == 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the tuple check into a warning.

@balisujohn
Copy link
Contributor Author

balisujohn commented Aug 4, 2022

@pseudo-rnd-thoughts Alright so I had a few questions about some of the comments so I left my own comments and I made changes to address the other ones and resolved them. Tests are passing locally so I don't know what the deal with CI/CD is. I think we should remove RandomNumberGenerator in a different PR, since it breaks space seeding tests when done naively.

@balisujohn
Copy link
Contributor Author

Alright passing tests and ready for final review. Definitely we need to be as careful as possible about this PR. It's going to lead to a lot of issues being created since it's a hard deprecation of a core feature. If anyone has ideas for where to add more deprecation warnings, I think that could make this easier on users.

@balisujohn
Copy link
Contributor Author

It occurs to me this will need a counterpart PR in the docs as well

Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments for the changes to the environment checker

@@ -73,7 +73,7 @@ def check_reset_seed(env: gym.Env):
and signature.parameters["kwargs"].kind is inspect.Parameter.VAR_KEYWORD
):
try:
obs_1 = env.reset(seed=123)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the first check in the environment checker, we should probably check that the output is a tuple with length 2 so users don't get a weird an explained error

Copy link
Contributor Author

@balisujohn balisujohn Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now I call check_reset_return_type before calling check_reset_seed, in check_env, so this should address that possibility.

)

if "options" not in signature.parameters and "kwargs" not in signature.parameters:
if "options" not in signature.parameters or "kwargs" in signature.parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have we lost the not for the kwargs check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually an automatic merge error since my original fork for this commit is sort of old. Good catch. I'm reverting it.

)


def check_reset_options(env: gym.Env):
"""Check that the environment can be reset with options.
def check_reset_info(env: gym.Env):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure this is needed and could probably be easily covered by the other functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I split check_reset_info out into check_reset_return_type and check_reset_return_info_deprecation. Some behavior is redundant with the call to env_reset_passive_checker , so we may want to stop calling the env_reset_passive_checker in the standard env_checker in a later PR. I think the other option would be to instrument env_reset_passive_checker with an argument to put it into "assert mode," but I feel like that's out of scope for this PR.

Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea for the changes, though Im not a fan of the assert mode as it feels like it could a lot more if statements

Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, do we want to add a check in the environment checker for seed. I don't think it should raise an error, just raise a warning

UserWarning
"""
seed_fn = getattr(env, "seed", None)
print(seed_fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove prints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert callable(env.seed)
check_seed_deprecation(env)

with warnings.catch_warnings():
Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts Aug 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check this works as intended we might want to add

with warnings.catch_warnings(record=True) as caught_warnings:
    <test code>

assert len(caught_warnings) == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I changed to this pattern

Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the print statement, LGTM

@balisujohn balisujohn changed the title Removing return_info argument to env.reset() and deprecated env.seed() function Removing return_info argument to env.reset() and deprecated env.seed() function (reset now always returns info) Aug 21, 2022
raise gym.error.Error(
"The `reset` method does not provide an `options` or `**kwargs` keyword argument."
if "return_info" in signature.parameters:
raise AssertionError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to make this a warning - it is an important information, but at the same time, in principle, it's not incorrect to implement a custom environment which takes extra optional arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'm changing this to a UserWarning.

@@ -164,8 +136,8 @@ def step_wait(self):
info,
) = step_api_compatibility(env.step(action), True)
if self._terminateds[i] or self._truncateds[i]:
observation, info = env.reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a mistake? In the old version, reset is after updating the info dict, now it's before, so I think it will record an incorrect final_observation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems to be either a mistake or a merge error, so I'm reverting it.

Copy link
Contributor Author

@balisujohn balisujohn Aug 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a mistake rather than a merge error, and I also found the mistake in the async vector env. Tests break when I switch the order of these statements to the correct order, so I need to examine why the tests are wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this and now tests are passing. One issues with a naive swap of the order is that calling obs, info = env,reset() overwrites the info object where "final_observation" was saved. So I used a temp variable to write the observation from the last call to .step() into the info returned by the call to .reset()



def _reset_return_info_type(self, seed=None, return_info=False, options=None):
def _deprecated_return_info(self, return_info=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the purpose of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(some comments/description, and type hints)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is we want to throw a warning if return_info is in the signature of an environment's reset function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, but these specific functions are extremely odd. Please at least annotate these tests so that it's clear to us 6 months from now what these functions are meant to do exactly, or perhaps restructure the tests so that it's more obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I missed your second comment, yeah I can add some more explanation of the specific reason for the deprecation test, as well as type hints.

else:
return self.observation_space.sample()
def _reset_return_info_type(self, seed=None, options=None):
return [1, 2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like that this is a random list of integers. Is this meant to emulate a reset output of an incorrectly implemented env? We need some commentary about this specific sequence of functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can be updated to
return [self.observation_space.sample(), {}]
I think this makes more sense

The point of the function is as a reset function where the type of the return is wrong.

Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jkterry1 jkterry1 merged commit 3a8daaf into openai:master Aug 23, 2022
@wookayin
Copy link
Contributor

wookayin commented Sep 4, 2022

Are we sure about this breaking change? Is it a part of 0.26.0 or v1.0? If it is 0.26.0, I don't understand why we have merged this so early; in my opinion this should not be the new default yet because it will break most of the existing code.

It'd be nice if such a breaking change can be introduced ONLY with MAJOR versions. My understanding is that the next release we're expecting is v0.26: #3056

@wookayin
Copy link
Contributor

wookayin commented Sep 4, 2022

Through gym v0.25, it was NEVER warned to users that env.reset() will start returning a Tuple instead of a single observation. There were some warnings about env.step() API changes (see new_step_api), but we had no compatiblity flag like for env.reset().

It would be REALLY surprising if v0.26 suddenly changes the behavior and there is no way back. This doesn't make much sense, and I strongly suggest the new defaults should be a part of v1.0, and hopefully after some deprecation period.

@wookayin
Copy link
Contributor

wookayin commented Sep 7, 2022

This has been modified to a full compatibility wrapper in #3066 (EnvCompatibility)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants