[Bug Report] TypeError: reset() got an unexpected keyword argument 'return_info' #889

elliottower · 2023-02-28T03:31:02Z

Describe the bug

I'm working on getting the RLlib_pistonball.py example script to work now that ray 2.3.0 supports gymnasium and I've gotten a simple example working, and fixed a few issues (ss.frame_stack called before ss.normalize_obs, swapping the order fixes it, rollout_fragment_length needs to be 128 rather than 512, as there are 4 rollout workers so it needs to be split in 4).

However, when I try to run the example with tune I get this error, because RLlib's ParallelPettingZooEnv return() method calls the environment's reset function with both seed and return_info, which is unrecognized by pettingzoo. Looking on the gymnasium documentation, the return_info parameter was removed and info is always expected to be returned:

https://gymnasium.farama.org/api/env/#gymnasium.Env.reset Changed in version v0.25: The return_info parameter was removed and now info is expected to be returned.

The error line is in this file, line 903: https://github.com/ray-project/ray/blob/master/rllib/env/multi_agent_env.py#L903

ray.exceptions.RayTaskError(TypeError): ray::RolloutWorker.apply() (pid=8597, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker._modify_class..Class object at 0x7ff691631c40>)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply
raise e
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/utils/actor_manager.py", line 174, in apply
return func(self, *args, **kwargs)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in
lambda w: w.sample(), local_worker=False, healthy_only=True
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 914, in sample
batches = [self.input_reader.next()]
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
batches = [self.get_data()]
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
item = next(self._env_runner)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 323, in run
outputs = self.step()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 342, in step
) = self._base_env.poll()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 633, in poll
) = env_state.poll()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 825, in poll
self.reset()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 909, in reset
raise e
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 903, in reset
obs_and_infos = self.env.reset(seed=seed, options=options)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/wrappers/pettingzoo_env.py", line 203, in reset
obs, info = self.par_env.reset(seed=seed, return_info=True, options=options)
TypeError: reset() got an unexpected keyword argument 'return_info'

However, when I changed the line 203 in https://github.com/ray-project/ray/blob/master/rllib/env/wrappers/pettingzoo_env.py#L203 to omit return_info, I got this error:

File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/wrappers/pettingzoo_env.py", line 203, in reset obs, info = self.par_env.reset(seed=seed, options=options) ValueError: too many values to unpack (expected 2) at time: 1.67755e+09

Looking at the PettingZoo documentation, the parallel API reset() function just returns an observation, rather than an observation and info: https://pettingzoo.farama.org/api/parallel/

The return value of self.par_env.reset(seed=seed, options=options) is a dict indexed by agent id: {'piston_0': ..., 'piston_1', ...},

I was able to fully fix the issue by changing ParallelPettingZooEnv's reset() method to be defined as follows:

def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
        obs = self.par_env.reset(seed=seed, options=options)
        return obs, {}

Example output from the script with this change:
Number of trials: 1/1 (1 RUNNING) +-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------| | PPO_pistonball_v6_a1f2d_00000 | RUNNING | 127.0.0.1:9930 | 4 | 934.378 | 2048 | -25.3935 | 453.081 | -262.085 | 125 | +-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+

However, the reason I believe this is a PettingZoo issue is that certain parallel envs like https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/atari/base_atari_env.py do return infos as well as observations.

I'm not sure if the best way to fix this, as the problem as I understand it is that I believe the ss wrappers and pettingzoo wrappers' reset functions don't accept the return_info parameter, so when RLlib calls reset() on the environment it gets an error. In conversions.py I can see that all of the parallel wrappers' reset functions do accept the return_info argument.

Maybe because gymnasium has deprecated the return_info argument it should be removed from PettingZoo as well to make things simpler? It would require then submitting a PR to ray to change the ParallelPettingZooEnv function's reset function, but I'm happy to do that. However, I'm not sure how this change would affect supersuit, I could look into that and change the things locally and see if it cuases issues, but I think it's probably best for someone who has better understanding of PettingZoo/SuperSuit/Gymnasium to sign off on those changes.

Code example

Basic working example not using ray (to show that the env works on its own, the pre-processing steps with ss aren't the problem afaik):


import supersuit as ss

from pettingzoo.butterfly import pistonball_v6

def env_creator(args):
    env = pistonball_v6.parallel_env(
        n_pistons=20,
        time_penalty=-0.1,
        continuous=True,
        random_drop=True,
        random_rotate=True,
        ball_mass=0.75,
        ball_friction=0.3,
        ball_elasticity=1.5,
        max_cycles=125,
        render_mode="human"
    )
    env = ss.color_reduction_v0(env, mode="B")
    env = ss.dtype_v0(env, "float32")
    env = ss.resize_v1(env, x_size=84, y_size=84)
    env = ss.normalize_obs_v0(env, env_min=0, env_max=1)
    env = ss.frame_stack_v1(env, 3)
    return env

if __name__ == "__main__":
    env = env_creator({})
    env.reset()

    while env.agents:
        actions = {agent: env.action_space(agent).sample() for agent in env.agents}
        observations, rewards, terminations, truncations, infos = env.step(actions)

rllib_pistonball.py:

"""Uses Ray's RLLib to train agents to play Pistonball.

Author: Rohan (https://github.com/Rohan138)
"""

import os

import ray
import supersuit as ss
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import ParallelPettingZooEnv
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.tune.registry import register_env
from torch import nn

from pettingzoo.butterfly import pistonball_v6

# raise NotImplementedError(
#     "There are currently bugs in this tutorial, we will fix them soon."
# )


class CNNModelV2(TorchModelV2, nn.Module):
    def __init__(self, obs_space, act_space, num_outputs, *args, **kwargs):
        TorchModelV2.__init__(self, obs_space, act_space, num_outputs, *args, **kwargs)
        nn.Module.__init__(self)
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, [8, 8], stride=(4, 4)),
            nn.ReLU(),
            nn.Conv2d(32, 64, [4, 4], stride=(2, 2)),
            nn.ReLU(),
            nn.Conv2d(64, 64, [3, 3], stride=(1, 1)),
            nn.ReLU(),
            nn.Flatten(),
            (nn.Linear(3136, 512)),
            nn.ReLU(),
        )
        self.policy_fn = nn.Linear(512, num_outputs)
        self.value_fn = nn.Linear(512, 1)

    def forward(self, input_dict, state, seq_lens):
        model_out = self.model(input_dict["obs"].permute(0, 3, 1, 2))
        self._value_out = self.value_fn(model_out)
        return self.policy_fn(model_out), state

    def value_function(self):
        return self._value_out.flatten()


def env_creator(args):
    env = pistonball_v6.parallel_env(
        n_pistons=20,
        time_penalty=-0.1,
        continuous=True,
        random_drop=True,
        random_rotate=True,
        ball_mass=0.75,
        ball_friction=0.3,
        ball_elasticity=1.5,
        max_cycles=125,
    )
    env = ss.color_reduction_v0(env, mode="B")
    env = ss.dtype_v0(env, "float32")
    env = ss.resize_v1(env, x_size=84, y_size=84)
    env = ss.normalize_obs_v0(env, env_min=0, env_max=1)
    env = ss.frame_stack_v1(env, 3)
    return env


if __name__ == "__main__":
    ray.init(local_mode=True)

    env_name = "pistonball_v6"

    register_env(env_name, lambda config: ParallelPettingZooEnv(env_creator(config)))
    ModelCatalog.register_custom_model("CNNModelV2", CNNModelV2)

    config = (
        PPOConfig()
        .rollouts(num_rollout_workers=4, rollout_fragment_length=128)
        .training(
            train_batch_size=512,
            lr=2e-5,
            gamma=0.99,
            lambda_=0.9,
            use_gae=True,
            clip_param=0.4,
            grad_clip=None,
            entropy_coeff=0.1,
            vf_loss_coeff=0.25,
            sgd_minibatch_size=64,
            num_sgd_iter=10,
        )
        .environment(env=env_name, clip_actions=True)
        .debugging(log_level="ERROR")
        .framework(framework="torch")
        .resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
    )

    tune.run(
        "PPO",
        name="PPO",
        stop={"timesteps_total": 5000000},
        checkpoint_freq=10,
        local_dir="~/ray_results/" + env_name,
        config=config.to_dict(),
    )



### System info

gym==0.23.1
Gymnasium==0.26.3
numpy==1.23.5
PettingZoo==1.22.3
Pillow==9.4.0
pygame==2.1.2
ray==2.3.0
SuperSuit==3.7.1
tianshou==0.4.11
torch==1.13.1

### Additional context

_No response_

### Checklist

- [X] I have checked that there is no similar [issue](https://github.com/Farama-Foundation/PettingZoo/issues) in the repo

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2023-03-01T13:34:32Z

I think that PettingZoo is currently quite inconsistent with the reset function definition and that some environment have and some do not have the parameter. @WillDudley or @jjshoots Any ideas?

jjshoots · 2023-03-01T20:59:55Z

@pseudo-rnd-thoughts I've made PRs in both PZ and SS removing the return_info, just needs a review and merge and release.

elliottower added the bug Something isn't working label Feb 28, 2023

elliottower changed the title ~~[Bug Report] Bug title~~ [Bug Report] TypeError: reset() got an unexpected keyword argument 'return_info' Feb 28, 2023

elliottower mentioned this issue Feb 28, 2023

[RLlib] ParallelPettingZooEnv TypeError: reset() got an unexpected keyword argument 'return_info' ray-project/ray#32889

Closed

This was referenced Feb 28, 2023

Remove return info #890

Merged

remove return info Farama-Foundation/SuperSuit#205

Merged

elliottower mentioned this issue Feb 28, 2023

[RLlib] Can't load trained model with PettingZoo leduc_holdem_v4 ray-project/ray#32910

Open

av3006 closed this as completed in #890 Mar 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] TypeError: reset() got an unexpected keyword argument 'return_info' #889

[Bug Report] TypeError: reset() got an unexpected keyword argument 'return_info' #889

elliottower commented Feb 28, 2023 •

edited

Loading

pseudo-rnd-thoughts commented Mar 1, 2023

jjshoots commented Mar 1, 2023

[Bug Report] TypeError: reset() got an unexpected keyword argument 'return_info' #889

[Bug Report] TypeError: reset() got an unexpected keyword argument 'return_info' #889

Comments

elliottower commented Feb 28, 2023 • edited Loading

Describe the bug

Code example

pseudo-rnd-thoughts commented Mar 1, 2023

jjshoots commented Mar 1, 2023

elliottower commented Feb 28, 2023 •

edited

Loading