Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing return_info argument to env.reset() and deprecated env.seed() function (reset now always returns info) #2962

Merged
merged 19 commits into from
Aug 23, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
c004fe2
removed return_info, made info dict mandatory in reset
balisujohn Jul 10, 2022
f4541c3
tenatively removed deprecated seed api for environments
balisujohn Jul 10, 2022
312bbce
added more info type checks to wrapper tests
balisujohn Jul 13, 2022
0ae71ce
Merge branch 'master' of github.com:openai/gym into dev-return-info-s…
balisujohn Jul 13, 2022
affa349
formatting/style compliance
balisujohn Jul 13, 2022
beddc39
addressed some comments
balisujohn Aug 4, 2022
7ee12a1
polish to address review
balisujohn Aug 9, 2022
7b4c6e5
Merge branch 'master' of github.com:openai/gym into dev-return-info-s…
balisujohn Aug 9, 2022
a153196
fixed tests after merge, and added a test of the return_info deprecat…
balisujohn Aug 9, 2022
1449c51
some organization of env_checker tests, reverted a probably merge error
balisujohn Aug 12, 2022
dda5f3f
added deprecation check for seed function in env
balisujohn Aug 21, 2022
0626136
updated docstring
balisujohn Aug 21, 2022
ce9b15d
Merge branch 'master' of github.com:openai/gym into dev-return-info-s…
balisujohn Aug 21, 2022
57d3a9d
removed debug prints, tweaked test_check_seed_deprecation
balisujohn Aug 21, 2022
daea9be
changed return_info deprecation check from assertion to warning
balisujohn Aug 21, 2022
bd42805
fixes to vector envs, now should be correctly structured
balisujohn Aug 21, 2022
49bf8ab
added some explanation and typehints for mockup depcreated return inf…
balisujohn Aug 21, 2022
6e03a48
re-removed seed function from vector envs
balisujohn Aug 21, 2022
4c09aec
added explanation to _reset_return_info_type and changed the return s…
balisujohn Aug 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@ The Gym API's API models environments as simple Python `env` classes. Creating e
```python
import gym
env = gym.make("CartPole-v1")
observation, info = env.reset(seed=42, return_info=True)
observation, info = env.reset(seed=42)

for _ in range(1000):
action = env.action_space.sample()
observation, reward, done, info = env.step(action)

if done:
observation, info = env.reset(return_info=True)
observation, info = env.reset()
env.close()
```

Expand Down
53 changes: 7 additions & 46 deletions gym/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,10 @@ class Env(Generic[ObsType, ActType], metaclass=decorator):
The main API methods that users of this class need to know are:

- :meth:`step` - Takes a step in the environment using an action returning the next observation, reward,
if the environment terminated and more information.
- :meth:`reset` - Resets the environment to an initial state, returning the initial observation.
if the environment terminated and observation information.
- :meth:`reset` - Resets the environment to an initial state, returning the initial observation and observation information.
- :meth:`render` - Renders the environment observation with modes depending on the output
- :meth:`close` - Closes the environment, important for rendering where pygame is imported
- :meth:`seed` - Seeds the environment's random number generator, :deprecated: in favor of `Env.reset(seed=seed)`.

And set the following attributes:

Expand Down Expand Up @@ -171,9 +170,8 @@ def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
) -> Union[ObsType, Tuple[ObsType, dict]]:
) -> Tuple[ObsType, dict]:
"""Resets the environment to an initial state and returns the initial observation.

This method can reset the environment's random number generator(s) if ``seed`` is an integer or
Expand All @@ -190,17 +188,14 @@ def reset(
If you pass an integer, the PRNG will be reset even if it already exists.
Usually, you want to pass an integer *right after the environment has been initialized and then never again*.
Please refer to the minimal example above to see this paradigm in action.
return_info (bool): If true, return additional information along with initial observation.
This info should be analogous to the info returned in :meth:`step`
options (optional dict): Additional information to specify how the environment is reset (optional,
depending on the specific environment)


Returns:
observation (object): Observation of the initial state. This will be an element of :attr:`observation_space`
(typically a numpy array) and is analogous to the observation returned by :meth:`step`.
info (optional dictionary): This will *only* be returned if ``return_info=True`` is passed.
It contains auxiliary information complementing ``observation``. This dictionary should be analogous to
info (dictionary): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to
the ``info`` returned by :meth:`step`.
"""
# Initialize the RNG if the seed is manually passed
Expand Down Expand Up @@ -246,33 +241,6 @@ def close(self):
"""
pass

def seed(self, seed=None):
""":deprecated: function that sets the seed for the environment's random number generator(s).

Use `env.reset(seed=seed)` as the new API for setting the seed of the environment.

Note:
Some environments use multiple pseudorandom number generators.
We want to capture all such seeds used in order to ensure that
there aren't accidental correlations between multiple generators.

Args:
seed(Optional int): The seed value for the random number generator

Returns:
seeds (List[int]): Returns the list of seeds used in this environment's random
number generators. The first value in the list should be the
"main" seed, or the value which a reproducer should pass to
'seed'. Often, the main seed equals the provided 'seed', but
this won't be true `if seed=None`, for example.
"""
deprecation(
"Function `env.seed(seed)` is marked as deprecated and will be removed in the future. "
"Please use `env.reset(seed=seed)` instead."
)
self._np_random, seed = seeding.np_random(seed)
return [seed]

@property
def unwrapped(self) -> "Env":
"""Returns the base non-wrapped environment.
Expand Down Expand Up @@ -423,7 +391,7 @@ def step(

return step_api_compatibility(self.env.step(action), self.new_step_api)

def reset(self, **kwargs) -> Union[ObsType, Tuple[ObsType, dict]]:
def reset(self, **kwargs) -> Tuple[ObsType, dict]:
"""Resets the environment with kwargs."""
return self.env.reset(**kwargs)

Expand All @@ -437,10 +405,6 @@ def close(self):
"""Closes the environment."""
return self.env.close()

def seed(self, seed=None):
"""Seeds the environment."""
return self.env.seed(seed)

def __str__(self):
"""Returns the wrapper name and the unwrapped environment string."""
return f"<{type(self).__name__}{self.env}>"
Expand Down Expand Up @@ -485,11 +449,8 @@ def observation(self, obs):

def reset(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we change this to the actual reset parameters?

Copy link
Contributor Author

@balisujohn balisujohn Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily against this, but some of our wrappers use this strategy to allow the wrapper to be agnostic to changes in the function definitions, such as https://github.com/openai/gym/blob/master/gym/wrappers/order_enforcing.py. Is this suggestion part of a broader goal of moving towards explicit argument type hinting for wrappers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good. Lets not do it because it allows the wrappers to be partially backward compatible

"""Resets the environment, returning a modified observation using :meth:`self.observation`."""
if kwargs.get("return_info", False):
obs, info = self.env.reset(**kwargs)
return self.observation(obs), info
else:
return self.observation(self.env.reset(**kwargs))
obs, info = self.env.reset(**kwargs)
return self.observation(obs), info

def step(self, action):
"""Returns a modified observation using :meth:`self.observation` after calling :meth:`env.step`."""
Expand Down
6 changes: 1 addition & 5 deletions gym/envs/box2d/bipedal_walker.py
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,6 @@ def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
):
super().reset(seed=seed)
Expand Down Expand Up @@ -514,10 +513,7 @@ def ReportFixture(self, fixture, point, normal, fraction):

self.lidar = [LidarCallback() for _ in range(10)]
self.renderer.reset()
if not return_info:
return self.step(np.array([0, 0, 0, 0]))[0]
else:
return self.step(np.array([0, 0, 0, 0]))[0], {}
return self.step(np.array([0, 0, 0, 0]))[0], {}

def step(self, action: np.ndarray):
assert self.hull is not None
Expand Down
6 changes: 1 addition & 5 deletions gym/envs/box2d/car_racing.py
Original file line number Diff line number Diff line change
Expand Up @@ -475,7 +475,6 @@ def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
):
super().reset(seed=seed)
Expand Down Expand Up @@ -507,10 +506,7 @@ def reset(
self.car = Car(self.world, *self.track[0][1:4])

self.renderer.reset()
if not return_info:
return self.step(None)[0]
else:
return self.step(None)[0], {}
return self.step(None)[0], {}

def step(self, action: Union[np.ndarray, int]):
assert self.car is not None
Expand Down
8 changes: 2 additions & 6 deletions gym/envs/box2d/lunar_lander.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,6 @@ def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
):
super().reset(seed=seed)
Expand Down Expand Up @@ -405,10 +404,7 @@ def reset(
self.drawlist = [self.lander] + self.legs

self.renderer.reset()
if not return_info:
return self.step(np.array([0, 0]) if self.continuous else 0)[0]
else:
return self.step(np.array([0, 0]) if self.continuous else 0)[0], {}
return self.step(np.array([0, 0]) if self.continuous else 0)[0], {}

def _create_particle(self, mass, x, y, ttl):
p = self.world.CreateDynamicBody(
Expand Down Expand Up @@ -769,7 +765,7 @@ def demo_heuristic_lander(env, seed=None, render=False):

total_reward = 0
steps = 0
s = env.reset(seed=seed)
s, info = env.reset(seed=seed)
while True:
a = heuristic(env, s)
s, r, terminated, truncated, info = step_api_compatibility(env.step(a), True)
Expand Down
13 changes: 2 additions & 11 deletions gym/envs/classic_control/acrobot.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,13 +180,7 @@ def __init__(self, render_mode: Optional[str] = None):
self.action_space = spaces.Discrete(3)
self.state = None

def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None
):
def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
super().reset(seed=seed)
# Note that if you use custom reset bounds, it may lead to out-of-bound
# state/observations.
Expand All @@ -199,10 +193,7 @@ def reset(

self.renderer.reset()
self.renderer.render_step()
if not return_info:
return self._get_ob()
else:
return self._get_ob(), {}
return self._get_ob(), {}

def step(self, a):
s = self.state
Expand Down
6 changes: 1 addition & 5 deletions gym/envs/classic_control/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,6 @@ def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
):
super().reset(seed=seed)
Expand All @@ -205,10 +204,7 @@ def reset(
self.steps_beyond_terminated = None
self.renderer.reset()
self.renderer.render_step()
if not return_info:
return np.array(self.state, dtype=np.float32)
else:
return np.array(self.state, dtype=np.float32), {}
return np.array(self.state, dtype=np.float32), {}

def render(self, mode="human"):
if self.render_mode is not None:
Expand Down
13 changes: 2 additions & 11 deletions gym/envs/classic_control/continuous_mountain_car.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,24 +174,15 @@ def step(self, action: np.ndarray):
self.renderer.render_step()
return self.state, reward, terminated, False, {}

def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None
):
def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
super().reset(seed=seed)
# Note that if you use custom reset bounds, it may lead to out-of-bound
# state/observations.
low, high = utils.maybe_parse_reset_bounds(options, -0.6, -0.4)
self.state = np.array([self.np_random.uniform(low=low, high=high), 0])
self.renderer.reset()
self.renderer.render_step()
if not return_info:
return np.array(self.state, dtype=np.float32)
else:
return np.array(self.state, dtype=np.float32), {}
return np.array(self.state, dtype=np.float32), {}

def _height(self, xs):
return np.sin(3 * xs) * 0.45 + 0.55
Expand Down
6 changes: 1 addition & 5 deletions gym/envs/classic_control/mountain_car.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,6 @@ def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
):
super().reset(seed=seed)
Expand All @@ -162,10 +161,7 @@ def reset(
self.state = np.array([self.np_random.uniform(low=low, high=high), 0])
self.renderer.reset()
self.renderer.render_step()
if not return_info:
return np.array(self.state, dtype=np.float32)
else:
return np.array(self.state, dtype=np.float32), {}
return np.array(self.state, dtype=np.float32), {}

def _height(self, xs):
return np.sin(3 * xs) * 0.45 + 0.55
Expand Down
13 changes: 2 additions & 11 deletions gym/envs/classic_control/pendulum.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,13 +138,7 @@ def step(self, u):
self.renderer.render_step()
return self._get_obs(), -costs, False, False, {}

def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None
):
def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
super().reset(seed=seed)
if options is None:
high = np.array([DEFAULT_X, DEFAULT_Y])
Expand All @@ -162,10 +156,7 @@ def reset(

self.renderer.reset()
self.renderer.render_step()
if not return_info:
return self._get_obs()
else:
return self._get_obs(), {}
return self._get_obs(), {}

def _get_obs(self):
theta, thetadot = self.state
Expand Down
6 changes: 1 addition & 5 deletions gym/envs/mujoco/mujoco_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,6 @@ def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
):
super().reset(seed=seed)
Expand All @@ -152,10 +151,7 @@ def reset(
ob = self.reset_model()
self.renderer.reset()
self.renderer.render_step()
if not return_info:
return ob
else:
return ob, {}
return ob, {}

def set_state(self, qpos, qvel):
"""
Expand Down
6 changes: 1 addition & 5 deletions gym/envs/toy_text/blackjack.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,6 @@ def _get_obs(self):
def reset(
self,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None,
):
super().reset(seed=seed)
Expand All @@ -189,10 +188,7 @@ def reset(
self.renderer.reset()
self.renderer.render_step()

if not return_info:
return self._get_obs()
else:
return self._get_obs(), {}
return self._get_obs(), {}

def render(self, mode="human"):
if self.render_mode is not None:
Expand Down
14 changes: 3 additions & 11 deletions gym/envs/toy_text/cliffwalking.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,22 +149,14 @@ def step(self, a):
self.renderer.render_step()
return (int(s), r, t, False, {"prob": p})

def reset(
self,
*,
seed: Optional[int] = None,
return_info: bool = False,
options: Optional[dict] = None
):
def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
super().reset(seed=seed)
self.s = categorical_sample(self.initial_state_distrib, self.np_random)
self.lastaction = None
self.renderer.reset()
self.renderer.render_step()
if not return_info:
return int(self.s)
else:
return int(self.s), {"prob": 1}

return int(self.s), {"prob": 1}

def render(self, mode="human"):
if self.render_mode is not None:
Expand Down
Loading