[RLlib] ParallelPettingZooEnv TypeError: reset() got an unexpected keyword argument 'return_info' #32889

elliottower · 2023-02-28T03:37:31Z

What happened + What you expected to happen

I opened an issue with PettingZoo which I believe is the main cause of the problem, but it also may be related to RLlib, posting this here in case someone who's worked on the ParallelPettingZooEnv class could help diagnose the problem: Farama-Foundation/PettingZoo#889

Versions / Dependencies

gym==0.23.1
Gymnasium==0.26.3
numpy==1.23.5
PettingZoo==1.22.3
Pillow==9.4.0
pygame==2.1.2
ray==2.3.0
SuperSuit==3.7.1
tianshou==0.4.11
torch==1.13.1

Reproduction script

Basic working example not using ray (to show that the env works on its own, the pre-processing steps with ss aren't the problem afaik):

import supersuit as ss

from pettingzoo.butterfly import pistonball_v6

def env_creator(args):
    env = pistonball_v6.parallel_env(
        n_pistons=20,
        time_penalty=-0.1,
        continuous=True,
        random_drop=True,
        random_rotate=True,
        ball_mass=0.75,
        ball_friction=0.3,
        ball_elasticity=1.5,
        max_cycles=125,
        render_mode="human"
    )
    env = ss.color_reduction_v0(env, mode="B")
    env = ss.dtype_v0(env, "float32")
    env = ss.resize_v1(env, x_size=84, y_size=84)
    env = ss.normalize_obs_v0(env, env_min=0, env_max=1)
    env = ss.frame_stack_v1(env, 3)
    return env

if __name__ == "__main__":
    env = env_creator({})
    env.reset()

    while env.agents:
        actions = {agent: env.action_space(agent).sample() for agent in env.agents}
        observations, rewards, terminations, truncations, infos = env.step(actions)

rllib_pistonball.py:

"""Uses Ray's RLLib to train agents to play Pistonball.

Author: Rohan (https://github.com/Rohan138)
"""

import os

import ray
import supersuit as ss
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import ParallelPettingZooEnv
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.tune.registry import register_env
from torch import nn

from pettingzoo.butterfly import pistonball_v6

# raise NotImplementedError(
#     "There are currently bugs in this tutorial, we will fix them soon."
# )


class CNNModelV2(TorchModelV2, nn.Module):
    def __init__(self, obs_space, act_space, num_outputs, *args, **kwargs):
        TorchModelV2.__init__(self, obs_space, act_space, num_outputs, *args, **kwargs)
        nn.Module.__init__(self)
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, [8, 8], stride=(4, 4)),
            nn.ReLU(),
            nn.Conv2d(32, 64, [4, 4], stride=(2, 2)),
            nn.ReLU(),
            nn.Conv2d(64, 64, [3, 3], stride=(1, 1)),
            nn.ReLU(),
            nn.Flatten(),
            (nn.Linear(3136, 512)),
            nn.ReLU(),
        )
        self.policy_fn = nn.Linear(512, num_outputs)
        self.value_fn = nn.Linear(512, 1)

    def forward(self, input_dict, state, seq_lens):
        model_out = self.model(input_dict["obs"].permute(0, 3, 1, 2))
        self._value_out = self.value_fn(model_out)
        return self.policy_fn(model_out), state

    def value_function(self):
        return self._value_out.flatten()


def env_creator(args):
    env = pistonball_v6.parallel_env(
        n_pistons=20,
        time_penalty=-0.1,
        continuous=True,
        random_drop=True,
        random_rotate=True,
        ball_mass=0.75,
        ball_friction=0.3,
        ball_elasticity=1.5,
        max_cycles=125,
    )
    env = ss.color_reduction_v0(env, mode="B")
    env = ss.dtype_v0(env, "float32")
    env = ss.resize_v1(env, x_size=84, y_size=84)
    env = ss.normalize_obs_v0(env, env_min=0, env_max=1)
    env = ss.frame_stack_v1(env, 3)
    return env


if __name__ == "__main__":
    ray.init(local_mode=True)

    env_name = "pistonball_v6"

    register_env(env_name, lambda config: ParallelPettingZooEnv(env_creator(config)))
    ModelCatalog.register_custom_model("CNNModelV2", CNNModelV2)

    config = (
        PPOConfig()
        .rollouts(num_rollout_workers=4, rollout_fragment_length=128)
        .training(
            train_batch_size=512,
            lr=2e-5,
            gamma=0.99,
            lambda_=0.9,
            use_gae=True,
            clip_param=0.4,
            grad_clip=None,
            entropy_coeff=0.1,
            vf_loss_coeff=0.25,
            sgd_minibatch_size=64,
            num_sgd_iter=10,
        )
        .environment(env=env_name, clip_actions=True)
        .debugging(log_level="ERROR")
        .framework(framework="torch")
        .resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
    )

    tune.run(
        "PPO",
        name="PPO",
        stop={"timesteps_total": 5000000},
        checkpoint_freq=10,
        local_dir="~/ray_results/" + env_name,
        config=config.to_dict(),
    )

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

Rohan138 · 2023-04-13T18:12:01Z

@elliottower going to go ahead and close this issue, since we're waiting for the next PettingZoo release to merge in #33470. Thank you for your contribution!

elliottower · 2023-04-13T20:20:47Z

@elliottower going to go ahead and close this issue, since we're waiting for the next PettingZoo release to merge in #33470. Thank you for your contribution!

Yep, sorry for the delay on that, we’re working hard to get it out but it’s taking longer than expected.

george-skal · 2023-05-19T13:15:22Z

Hi, @elliottower
I have the same error with the latest versions of Pettingzoo and Supersuit here
Do you have any idea, if there is a workaround or if it is going to be fixed?
Thanks, George

elliottower · 2023-05-19T14:02:27Z

Yes I have a PR fixing it here, working on getting it merged asap #34696

elliottower · 2023-05-19T15:10:34Z

@Rohan138 this should probably be re-opened

adrienJeg · 2023-06-17T11:47:15Z

Hello, I am having the same issue, do you know when it will be patched ? I am trying to get my hands in the field of MARL and wanted to run the RLlib tutorial for PettingZoo but unfortunately I'm stuck...
Best regards,
Adrien

elliottower · 2023-06-17T13:55:24Z

Hello, I am having the same issue, do you know when it will be patched ? I am trying to get my hands in the field of MARL and wanted to run the RLlib tutorial for PettingZoo but unfortunately I'm stuck...
Best regards,
Adrien

Hi, we are waiting on another PR to be merged which fixes gymnasium support. In the meantime I think you can install from my or branch via pip install “ray[rllib] @ git+https://blah.git”

sven1977 · 2023-06-22T12:30:36Z

Hey everyone, I'm taking a stab at this via this PR, originally started by @Rohan138 .
#35698

elliottower · 2023-06-22T13:25:53Z

Hey everyone, I'm taking a stab at this via this PR, originally started by @Rohan138 .
#35698

If you want to incorporate the PettingZoo changes as well that would be awesome (super simple code fix, in this pr #34696, though we are releasing 1.23.2 soon which fixes some issues in the chess environment and some other random things, and I know gymnasium is releasing 0.28.2 soon as well just fyi)

sven1977 · 2023-09-13T10:53:52Z

This is being addressed by this (currently in-review) PR:
#39459

Will be merged very soon and be part of Ray 2.8.

sven1977 · 2023-09-13T11:03:55Z

With the above PR, this minimal example is confirmed working. After merging the PR, we can close this issue, then:

import ray
import supersuit as ss
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import ParallelPettingZooEnv
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.tune.registry import register_env
from torch import nn

from pettingzoo.butterfly import pistonball_v6


class CNNModelV2(TorchModelV2, nn.Module):
    def __init__(self, obs_space, act_space, num_outputs, *args, **kwargs):
        TorchModelV2.__init__(self, obs_space, act_space, num_outputs, *args, **kwargs)
        nn.Module.__init__(self)
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, [8, 8], stride=(4, 4)),
            nn.ReLU(),
            nn.Conv2d(32, 64, [4, 4], stride=(2, 2)),
            nn.ReLU(),
            nn.Conv2d(64, 64, [3, 3], stride=(1, 1)),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(3136, 512),
            nn.ReLU(),
        )
        self.policy_fn = nn.Linear(512, num_outputs)
        self.value_fn = nn.Linear(512, 1)

    def forward(self, input_dict, state, seq_lens):
        model_out = self.model(input_dict["obs"].permute(0, 3, 1, 2))
        self._value_out = self.value_fn(model_out)
        return self.policy_fn(model_out), state

    def value_function(self):
        return self._value_out.flatten()


def env_creator(args):
    env = pistonball_v6.parallel_env(
        n_pistons=20,
        time_penalty=-0.1,
        continuous=True,
        random_drop=True,
        random_rotate=True,
        ball_mass=0.75,
        ball_friction=0.3,
        ball_elasticity=1.5,
        max_cycles=125,
    )
    env = ss.color_reduction_v0(env, mode="B")
    env = ss.dtype_v0(env, "float32")
    env = ss.resize_v1(env, x_size=84, y_size=84)
    env = ss.normalize_obs_v0(env, env_min=0, env_max=1)
    env = ss.frame_stack_v1(env, 3)
    return env


if __name__ == "__main__":
    ray.init()

    env_name = "pistonball_v6"

    register_env(env_name, lambda config: ParallelPettingZooEnv(env_creator(config)))
    ModelCatalog.register_custom_model("CNNModelV2", CNNModelV2)

    config = (
        PPOConfig()
        .training(
            train_batch_size=512,
            sgd_minibatch_size=256,
            num_sgd_iter=2,
        )
        .environment(env=env_name, clip_actions=True)
    )

    tune.run(
        "PPO",
        config=config,
    )

elliottower · 2023-09-13T13:57:30Z

Awesome, thanks. Could you close this when it’s been merged? Will be on the look out for the ray release, exciting to see the work in that PR allowing for more flexibility with action spaces and such as well.

sven1977 · 2023-09-21T15:02:59Z

Sorry, this is still not done. Pushing for the reviewers to give this approval ...

sven1977 · 2023-09-21T17:01:23Z

PR got merged into master. Closing this issue as well.
Related issue: #39453

elliottower · 2023-09-21T17:06:17Z

Cheers, thanks so much for all the help with this @sven1977

elliottower added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 28, 2023

hora-anyscale added rllib RLlib related issues rllib-multi-agent An RLlib multi-agent related problem. P0 Issues that should be fixed in short order and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 3, 2023

elliottower mentioned this issue Mar 20, 2023

[RLlib] Remove return_info from reset() in pettingzoo_env.py. #33470

Closed

8 tasks

Rohan138 closed this as completed Apr 13, 2023

Rohan138 reopened this May 19, 2023

Rohan138 mentioned this issue May 19, 2023

[RLlib] TypeError with latest Pettingzoo version (unexpected keyword argument 'return_info') #35394

Closed

gresavage mentioned this issue May 25, 2023

[RLlib] Update PettingZoo wrapper to current API (1.23.0) #34696

Closed

8 tasks

Unamed318 mentioned this issue Jul 2, 2023

ppo_4x4grid.py running issue LucasAlegre/sumo-rl#145

Closed

sven1977 self-assigned this Sep 13, 2023

sven1977 mentioned this issue Sep 13, 2023

[RLlib] Issue 39453: PettingZoo wrappers should use correct multi-agent dict action- and observation spaces. #39459

Merged

8 tasks

sven1977 closed this as completed Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] ParallelPettingZooEnv TypeError: reset() got an unexpected keyword argument 'return_info' #32889

[RLlib] ParallelPettingZooEnv TypeError: reset() got an unexpected keyword argument 'return_info' #32889

elliottower commented Feb 28, 2023

Rohan138 commented Apr 13, 2023

elliottower commented Apr 13, 2023

george-skal commented May 19, 2023

elliottower commented May 19, 2023

elliottower commented May 19, 2023

adrienJeg commented Jun 17, 2023

elliottower commented Jun 17, 2023

sven1977 commented Jun 22, 2023

elliottower commented Jun 22, 2023

sven1977 commented Sep 13, 2023

sven1977 commented Sep 13, 2023

elliottower commented Sep 13, 2023

sven1977 commented Sep 21, 2023

sven1977 commented Sep 21, 2023

elliottower commented Sep 21, 2023

[RLlib] ParallelPettingZooEnv TypeError: reset() got an unexpected keyword argument 'return_info' #32889

[RLlib] ParallelPettingZooEnv TypeError: reset() got an unexpected keyword argument 'return_info' #32889

Comments

elliottower commented Feb 28, 2023

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Rohan138 commented Apr 13, 2023

elliottower commented Apr 13, 2023

george-skal commented May 19, 2023

elliottower commented May 19, 2023

elliottower commented May 19, 2023

adrienJeg commented Jun 17, 2023

elliottower commented Jun 17, 2023

sven1977 commented Jun 22, 2023

elliottower commented Jun 22, 2023

sven1977 commented Sep 13, 2023

sven1977 commented Sep 13, 2023

elliottower commented Sep 13, 2023

sven1977 commented Sep 21, 2023

sven1977 commented Sep 21, 2023

elliottower commented Sep 21, 2023