You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on getting the RLlib_pistonball.py example script to work now that ray 2.3.0 supports gymnasium and I've gotten a simple example working, and fixed a few issues (ss.frame_stack called before ss.normalize_obs, swapping the order fixes it, rollout_fragment_length needs to be 128 rather than 512, as there are 4 rollout workers so it needs to be split in 4).
However, when I try to run the example with tune I get this error, because RLlib's ParallelPettingZooEnv return() method calls the environment's reset function with both seed and return_info, which is unrecognized by pettingzoo. Looking on the gymnasium documentation, the return_info parameter was removed and info is always expected to be returned:
ray.exceptions.RayTaskError(TypeError): ray::RolloutWorker.apply() (pid=8597, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker._modify_class..Class object at 0x7ff691631c40>)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply
raise e
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/utils/actor_manager.py", line 174, in apply
return func(self, *args, **kwargs)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in
lambda w: w.sample(), local_worker=False, healthy_only=True
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 914, in sample
batches = [self.input_reader.next()]
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
batches = [self.get_data()]
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
item = next(self._env_runner)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 323, in run
outputs = self.step()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 342, in step
) = self._base_env.poll()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 633, in poll
) = env_state.poll()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 825, in poll
self.reset()
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 909, in reset
raise e
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 903, in reset
obs_and_infos = self.env.reset(seed=seed, options=options)
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/wrappers/pettingzoo_env.py", line 203, in reset
obs, info = self.par_env.reset(seed=seed, return_info=True, options=options)
TypeError: reset() got an unexpected keyword argument 'return_info'
File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/wrappers/pettingzoo_env.py", line 203, in reset obs, info = self.par_env.reset(seed=seed, options=options) ValueError: too many values to unpack (expected 2) at time: 1.67755e+09
Looking at the PettingZoo documentation, the parallel API reset() function just returns an observation, rather than an observation and info: https://pettingzoo.farama.org/api/parallel/
The return value of self.par_env.reset(seed=seed, options=options) is a dict indexed by agent id: {'piston_0': ..., 'piston_1', ...},
I was able to fully fix the issue by changing ParallelPettingZooEnv's reset() method to be defined as follows:
Example output from the script with this change: Number of trials: 1/1 (1 RUNNING) +-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------| | PPO_pistonball_v6_a1f2d_00000 | RUNNING | 127.0.0.1:9930 | 4 | 934.378 | 2048 | -25.3935 | 453.081 | -262.085 | 125 | +-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
I'm not sure if the best way to fix this, as the problem as I understand it is that I believe the ss wrappers and pettingzoo wrappers' reset functions don't accept the return_info parameter, so when RLlib calls reset() on the environment it gets an error. In conversions.py I can see that all of the parallel wrappers' reset functions do accept the return_info argument.
Maybe because gymnasium has deprecated the return_info argument it should be removed from PettingZoo as well to make things simpler? It would require then submitting a PR to ray to change the ParallelPettingZooEnv function's reset function, but I'm happy to do that. However, I'm not sure how this change would affect supersuit, I could look into that and change the things locally and see if it cuases issues, but I think it's probably best for someone who has better understanding of PettingZoo/SuperSuit/Gymnasium to sign off on those changes.
Code example
Basic working example not using ray (to show that the env works on its own, the pre-processing steps with ss aren't the problem afaik):import supersuit as ssfrom pettingzoo.butterfly import pistonball_v6def env_creator(args): env = pistonball_v6.parallel_env( n_pistons=20, time_penalty=-0.1, continuous=True, random_drop=True, random_rotate=True, ball_mass=0.75, ball_friction=0.3, ball_elasticity=1.5, max_cycles=125, render_mode="human" ) env = ss.color_reduction_v0(env, mode="B") env = ss.dtype_v0(env, "float32") env = ss.resize_v1(env, x_size=84, y_size=84) env = ss.normalize_obs_v0(env, env_min=0, env_max=1) env = ss.frame_stack_v1(env, 3) return envif __name__ == "__main__": env = env_creator({}) env.reset() while env.agents: actions = {agent: env.action_space(agent).sample() for agent in env.agents} observations, rewards, terminations, truncations, infos = env.step(actions)
rllib_pistonball.py:
"""Uses Ray's RLLib to train agents to play Pistonball.
Author: Rohan (https://github.com/Rohan138)
"""
import os
import ray
import supersuit as ss
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import ParallelPettingZooEnv
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.tune.registry import register_env
from torch import nn
from pettingzoo.butterfly import pistonball_v6
# raise NotImplementedError(
# "There are currently bugs in this tutorial, we will fix them soon."
# )
class CNNModelV2(TorchModelV2, nn.Module):
def __init__(self, obs_space, act_space, num_outputs, *args, **kwargs):
TorchModelV2.__init__(self, obs_space, act_space, num_outputs, *args, **kwargs)
nn.Module.__init__(self)
self.model = nn.Sequential(
nn.Conv2d(3, 32, [8, 8], stride=(4, 4)),
nn.ReLU(),
nn.Conv2d(32, 64, [4, 4], stride=(2, 2)),
nn.ReLU(),
nn.Conv2d(64, 64, [3, 3], stride=(1, 1)),
nn.ReLU(),
nn.Flatten(),
(nn.Linear(3136, 512)),
nn.ReLU(),
)
self.policy_fn = nn.Linear(512, num_outputs)
self.value_fn = nn.Linear(512, 1)
def forward(self, input_dict, state, seq_lens):
model_out = self.model(input_dict["obs"].permute(0, 3, 1, 2))
self._value_out = self.value_fn(model_out)
return self.policy_fn(model_out), state
def value_function(self):
return self._value_out.flatten()
def env_creator(args):
env = pistonball_v6.parallel_env(
n_pistons=20,
time_penalty=-0.1,
continuous=True,
random_drop=True,
random_rotate=True,
ball_mass=0.75,
ball_friction=0.3,
ball_elasticity=1.5,
max_cycles=125,
)
env = ss.color_reduction_v0(env, mode="B")
env = ss.dtype_v0(env, "float32")
env = ss.resize_v1(env, x_size=84, y_size=84)
env = ss.normalize_obs_v0(env, env_min=0, env_max=1)
env = ss.frame_stack_v1(env, 3)
return env
if __name__ == "__main__":
ray.init(local_mode=True)
env_name = "pistonball_v6"
register_env(env_name, lambda config: ParallelPettingZooEnv(env_creator(config)))
ModelCatalog.register_custom_model("CNNModelV2", CNNModelV2)
config = (
PPOConfig()
.rollouts(num_rollout_workers=4, rollout_fragment_length=128)
.training(
train_batch_size=512,
lr=2e-5,
gamma=0.99,
lambda_=0.9,
use_gae=True,
clip_param=0.4,
grad_clip=None,
entropy_coeff=0.1,
vf_loss_coeff=0.25,
sgd_minibatch_size=64,
num_sgd_iter=10,
)
.environment(env=env_name, clip_actions=True)
.debugging(log_level="ERROR")
.framework(framework="torch")
.resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
)
tune.run(
"PPO",
name="PPO",
stop={"timesteps_total": 5000000},
checkpoint_freq=10,
local_dir="~/ray_results/" + env_name,
config=config.to_dict(),
)
### System info
gym==0.23.1
Gymnasium==0.26.3
numpy==1.23.5
PettingZoo==1.22.3
Pillow==9.4.0
pygame==2.1.2
ray==2.3.0
SuperSuit==3.7.1
tianshou==0.4.11
torch==1.13.1
### Additional context
_No response_
### Checklist
- [X] I have checked that there is no similar [issue](https://github.com/Farama-Foundation/PettingZoo/issues) in the repo
The text was updated successfully, but these errors were encountered:
I think that PettingZoo is currently quite inconsistent with the reset function definition and that some environment have and some do not have the parameter. @WillDudley or @jjshoots Any ideas?
Describe the bug
I'm working on getting the RLlib_pistonball.py example script to work now that ray 2.3.0 supports gymnasium and I've gotten a simple example working, and fixed a few issues (ss.frame_stack called before ss.normalize_obs, swapping the order fixes it, rollout_fragment_length needs to be 128 rather than 512, as there are 4 rollout workers so it needs to be split in 4).
However, when I try to run the example with
tune
I get this error, because RLlib's ParallelPettingZooEnv return() method calls the environment's reset function with both seed andreturn_info
, which is unrecognized by pettingzoo. Looking on the gymnasium documentation, thereturn_info
parameter was removed and info is always expected to be returned:https://gymnasium.farama.org/api/env/#gymnasium.Env.reset
Changed in version v0.25: The return_info parameter was removed and now info is expected to be returned.
The error line is in this file, line 903: https://github.com/ray-project/ray/blob/master/rllib/env/multi_agent_env.py#L903
However, when I changed the line 203 in https://github.com/ray-project/ray/blob/master/rllib/env/wrappers/pettingzoo_env.py#L203 to omit
return_info
, I got this error:File "/Users/elliottower/anaconda3/envs/PettingZoo/lib/python3.8/site-packages/ray/rllib/env/wrappers/pettingzoo_env.py", line 203, in reset obs, info = self.par_env.reset(seed=seed, options=options) ValueError: too many values to unpack (expected 2) at time: 1.67755e+09
Looking at the PettingZoo documentation, the parallel API reset() function just returns an observation, rather than an observation and info: https://pettingzoo.farama.org/api/parallel/
The return value of
self.par_env.reset(seed=seed, options=options)
is a dict indexed by agent id: {'piston_0': ..., 'piston_1', ...},I was able to fully fix the issue by changing ParallelPettingZooEnv's reset() method to be defined as follows:
Example output from the script with this change:
Number of trials: 1/1 (1 RUNNING) +-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------| | PPO_pistonball_v6_a1f2d_00000 | RUNNING | 127.0.0.1:9930 | 4 | 934.378 | 2048 | -25.3935 | 453.081 | -262.085 | 125 | +-------------------------------+----------+----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
However, the reason I believe this is a PettingZoo issue is that certain parallel envs like https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/atari/base_atari_env.py do return infos as well as observations.
I'm not sure if the best way to fix this, as the problem as I understand it is that I believe the ss wrappers and pettingzoo wrappers' reset functions don't accept the
return_info
parameter, so when RLlib calls reset() on the environment it gets an error. In conversions.py I can see that all of the parallel wrappers' reset functions do accept thereturn_info
argument.Maybe because gymnasium has deprecated the
return_info
argument it should be removed from PettingZoo as well to make things simpler? It would require then submitting a PR to ray to change the ParallelPettingZooEnv function's reset function, but I'm happy to do that. However, I'm not sure how this change would affect supersuit, I could look into that and change the things locally and see if it cuases issues, but I think it's probably best for someone who has better understanding of PettingZoo/SuperSuit/Gymnasium to sign off on those changes.Code example
rllib_pistonball.py:
The text was updated successfully, but these errors were encountered: