Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] TypeError: reset() got an unexpected keyword argument 'return_info' #889

Closed
elliottower opened this issue Feb 28, 2023 · 2 comments · Fixed by #890
Closed
Labels
bug Something isn't working

Comments

@elliottower
Copy link
Member

elliottower commented Feb 28, 2023

Describe the bug

I'm working on getting the RLlib_pistonball.py example script to work now that ray 2.3.0 supports gymnasium and I've gotten a simple example working, and fixed a few issues (ss.frame_stack called before ss.normalize_obs, swapping the order fixes it, rollout_fragment_length needs to be 128 rather than 512, as there are 4 rollout workers so it needs to be split in 4).

However, when I try to run the example with tune I get this error, because RLlib's ParallelPettingZooEnv return() method calls the environment's reset function with both seed and return_info, which is unrecognized by pettingzoo. Looking on the gymnasium documentation, the return_info parameter was removed and info is always expected to be returned:

https://gymnasium.farama.org/api/env/#gymnasium.Env.reset Changed in version v0.25: The return_info parameter was removed and now info is expected to be returned.

The error line is in this file, line 903: https://github.com/ray-project/ray/blob/master/rllib/env/multi_agent_env.py#L903

ray.exceptions.RayTaskError(TypeError): ray::RolloutWorker.apply() (pid=8597, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker._modify_class..Class object at 0x7ff691631c40>)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply
raise e
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/utils/actor_manager.py", line 174, in apply
return func(self, *args, **kwargs)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in
lambda w: w.sample(), local_worker=False, healthy_only=True
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 914, in sample
batches = [self.input_reader.next()]
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
batches = [self.get_data()]
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
item = next(self._env_runner)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 323, in run
outputs = self.step()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 342, in step
) = self._base_env.poll()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 633, in poll
) = env_state.poll()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 825, in poll
self.reset()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 909, in reset
raise e
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 903, in reset
obs_and_infos = self.env.reset(seed=seed, options=options)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/wrappers/pettingzoo_env.py", line 203, in reset
obs, info = self.par_env.reset(seed=seed, return_info=True, options=options)
TypeError: reset() got an unexpected keyword argument 'return_info'

However, when I changed the line 203 in https://github.com/ray-project/ray/blob/master/rllib/env/wrappers/pettingzoo_env.py#L203 to omit return_info, I got this error:

File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/wrappers/pettingzoo_env.py", line 203, in reset obs, info = self.par_env.reset(seed=seed, options=options) ValueError: too many values to unpack (expected 2) at time: 1.67755e+09

Looking at the PettingZoo documentation, the parallel API reset() function just returns an observation, rather than an observation and info: https://pettingzoo.farama.org/api/parallel/

The return value of self.par_env.reset(seed=seed, options=options) is a dict indexed by agent id: {'piston_0': ..., 'piston_1', ...},

I was able to fully fix the issue by changing ParallelPettingZooEnv's reset() method to be defined as follows:

def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
        obs = self.par_env.reset(seed=seed, options=options)
        return obs, {}

Example output from the script with this change:
Number of trials: 1/1 (1 RUNNING) +-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------| | PPO_pistonball_v6_a1f2d_00000 | RUNNING | 127.0.0.1:9930 | 4 | 934.378 | 2048 | -25.3935 | 453.081 | -262.085 | 125 | +-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+

However, the reason I believe this is a PettingZoo issue is that certain parallel envs like https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/atari/base_atari_env.py do return infos as well as observations.

I'm not sure if the best way to fix this, as the problem as I understand it is that I believe the ss wrappers and pettingzoo wrappers' reset functions don't accept the return_info parameter, so when RLlib calls reset() on the environment it gets an error. In conversions.py I can see that all of the parallel wrappers' reset functions do accept the return_info argument.

Maybe because gymnasium has deprecated the return_info argument it should be removed from PettingZoo as well to make things simpler? It would require then submitting a PR to ray to change the ParallelPettingZooEnv function's reset function, but I'm happy to do that. However, I'm not sure how this change would affect supersuit, I could look into that and change the things locally and see if it cuases issues, but I think it's probably best for someone who has better understanding of PettingZoo/SuperSuit/Gymnasium to sign off on those changes.

Code example

Basic working example not using ray (to show that the env works on its own, the pre-processing steps with ss aren't the problem afaik):


import supersuit as ss

from pettingzoo.butterfly import pistonball_v6

def env_creator(args):
    env = pistonball_v6.parallel_env(
        n_pistons=20,
        time_penalty=-0.1,
        continuous=True,
        random_drop=True,
        random_rotate=True,
        ball_mass=0.75,
        ball_friction=0.3,
        ball_elasticity=1.5,
        max_cycles=125,
        render_mode="human"
    )
    env = ss.color_reduction_v0(env, mode="B")
    env = ss.dtype_v0(env, "float32")
    env = ss.resize_v1(env, x_size=84, y_size=84)
    env = ss.normalize_obs_v0(env, env_min=0, env_max=1)
    env = ss.frame_stack_v1(env, 3)
    return env

if __name__ == "__main__":
    env = env_creator({})
    env.reset()

    while env.agents:
        actions = {agent: env.action_space(agent).sample() for agent in env.agents}
        observations, rewards, terminations, truncations, infos = env.step(actions)

rllib_pistonball.py:

"""Uses Ray's RLLib to train agents to play Pistonball.

Author: Rohan (https://github.com/Rohan138)
"""

import os

import ray
import supersuit as ss
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import ParallelPettingZooEnv
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.tune.registry import register_env
from torch import nn

from pettingzoo.butterfly import pistonball_v6

# raise NotImplementedError(
#     "There are currently bugs in this tutorial, we will fix them soon."
# )


class CNNModelV2(TorchModelV2, nn.Module):
    def __init__(self, obs_space, act_space, num_outputs, *args, **kwargs):
        TorchModelV2.__init__(self, obs_space, act_space, num_outputs, *args, **kwargs)
        nn.Module.__init__(self)
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, [8, 8], stride=(4, 4)),
            nn.ReLU(),
            nn.Conv2d(32, 64, [4, 4], stride=(2, 2)),
            nn.ReLU(),
            nn.Conv2d(64, 64, [3, 3], stride=(1, 1)),
            nn.ReLU(),
            nn.Flatten(),
            (nn.Linear(3136, 512)),
            nn.ReLU(),
        )
        self.policy_fn = nn.Linear(512, num_outputs)
        self.value_fn = nn.Linear(512, 1)

    def forward(self, input_dict, state, seq_lens):
        model_out = self.model(input_dict["obs"].permute(0, 3, 1, 2))
        self._value_out = self.value_fn(model_out)
        return self.policy_fn(model_out), state

    def value_function(self):
        return self._value_out.flatten()


def env_creator(args):
    env = pistonball_v6.parallel_env(
        n_pistons=20,
        time_penalty=-0.1,
        continuous=True,
        random_drop=True,
        random_rotate=True,
        ball_mass=0.75,
        ball_friction=0.3,
        ball_elasticity=1.5,
        max_cycles=125,
    )
    env = ss.color_reduction_v0(env, mode="B")
    env = ss.dtype_v0(env, "float32")
    env = ss.resize_v1(env, x_size=84, y_size=84)
    env = ss.normalize_obs_v0(env, env_min=0, env_max=1)
    env = ss.frame_stack_v1(env, 3)
    return env


if __name__ == "__main__":
    ray.init(local_mode=True)

    env_name = "pistonball_v6"

    register_env(env_name, lambda config: ParallelPettingZooEnv(env_creator(config)))
    ModelCatalog.register_custom_model("CNNModelV2", CNNModelV2)

    config = (
        PPOConfig()
        .rollouts(num_rollout_workers=4, rollout_fragment_length=128)
        .training(
            train_batch_size=512,
            lr=2e-5,
            gamma=0.99,
            lambda_=0.9,
            use_gae=True,
            clip_param=0.4,
            grad_clip=None,
            entropy_coeff=0.1,
            vf_loss_coeff=0.25,
            sgd_minibatch_size=64,
            num_sgd_iter=10,
        )
        .environment(env=env_name, clip_actions=True)
        .debugging(log_level="ERROR")
        .framework(framework="torch")
        .resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
    )

    tune.run(
        "PPO",
        name="PPO",
        stop={"timesteps_total": 5000000},
        checkpoint_freq=10,
        local_dir="~/ray_results/" + env_name,
        config=config.to_dict(),
    )


### System info

gym==0.23.1
Gymnasium==0.26.3
numpy==1.23.5
PettingZoo==1.22.3
Pillow==9.4.0
pygame==2.1.2
ray==2.3.0
SuperSuit==3.7.1
tianshou==0.4.11
torch==1.13.1

### Additional context

_No response_

### Checklist

- [X] I have checked that there is no similar [issue](https://github.com/Farama-Foundation/PettingZoo/issues) in the repo
@elliottower elliottower added the bug Something isn't working label Feb 28, 2023
@elliottower elliottower changed the title [Bug Report] Bug title [Bug Report] TypeError: reset() got an unexpected keyword argument 'return_info' Feb 28, 2023
@pseudo-rnd-thoughts
Copy link
Member

I think that PettingZoo is currently quite inconsistent with the reset function definition and that some environment have and some do not have the parameter. @WillDudley or @jjshoots Any ideas?

@jjshoots
Copy link
Member

jjshoots commented Mar 1, 2023

@pseudo-rnd-thoughts I've made PRs in both PZ and SS removing the return_info, just needs a review and merge and release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants