Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Remove return_info from reset() in pettingzoo_env.py. #33470

Closed
wants to merge 959 commits into from

Conversation

elliottower
Copy link

Why are these changes needed?

The return_info reset() parameter has been removed from the PettingZoo API for a few weeks: Farama-Foundation/PettingZoo#890

Render() also has not taken the render_mode argument since October of 2022: Farama-Foundation/PettingZoo@a74a933

Motivation is that current pettingzoo version cannot run with RLlib without modifying these files locally to get rid of return_info (have tested with both parallel and aec envs, see Farama-Foundation/PettingZoo#899)

Related issue number

Closes #32889

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@elliottower
Copy link
Author

elliottower commented Mar 20, 2023

The DCO check failed due to commits being not signed, but I have
git config --global commit.gpgsign true and it says all of my commits are verified. Followed the rebase instructions from the error message. Now I see there are some other error messages but I can't figure out what the problem could be. Can someone help figure these out?

My only thought is that maybe the internal tests are using an older version of gymnasium/pettingzoo which still has return_info. Edit: I found this file for rllib with version 1.22.1, whereas 1.22.3 is the most widely used and 1.22.4 is the current release as of today.

Edit2: I just now tried doing git commit -S and still got the DCO error (lowercase s, my bad), so need to rebase again

@elliottower
Copy link
Author

@ArturNiederfahrenhorst any chance you could help look this over? I would like to use RLlib with the most recent PettingZoo release but this makes it impossible without locally modifying it

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @elliottower !

Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Can you pull ray/master into your pr that way we can see the latest tests run?

@@ -154,7 +154,7 @@ def close(self):
self.env.close()

def render(self):
return self.env.render(self.render_mode)
return self.env.render()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the newest petting zoo version, are render modes no longer configurable?

if so the whole pr looks p much looks good to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specify the render mode nowadays in your gym.make call and specify available modes in Env.meta_data["render_modes"] = ["human", "rgb_array", ...]:
https://gymnasium.farama.org/api/env/#gymnasium.Env.render

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specify the render mode nowadays in your gym.make call and specify available modes in Env.meta_data["render_modes"] = ["human", "rgb_array", ...]: https://gymnasium.farama.org/api/env/#gymnasium.Env.render

PettingZoo doesn't use gym.make but yeah it has to be specified on initialization of the environment, rather than in the render() call.

@avnishn
Copy link
Member

avnishn commented Mar 27, 2023

oh and please run the linter on your changes ./ci/lint/format.sh

@@ -15,7 +15,7 @@ kaggle_environments==1.7.11
#mlagents==0.28.0
mlagents_envs==0.28.0
# For tests on PettingZoo's multi-agent envs.
pettingzoo==1.22.1; python_version >= '3.7'
pettingzoo==1.22.4; python_version >= '3.7'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to update the gymnasium version to make these work?
What about supersuit? Let's also upgrade here, otherwise, this could lead to further problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should actually be able to bump the gym version with no problems here. Can you please do so @elliottower. I'd be happy to assist you in the upgrade if any issues come up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'll change it to the most recent versions of all of them. Gymnasium had a release 28 late last week and 28.1 a day or two later with hotfixes. The update made a slight change to the Spaces, such as Sequence action space, which led to an error in the PettingZoo CI, which I've now fixed (previous spaces.Sequence() calls need to specify spaces.Sequence(stack=True) to get the same behavior as before). I've fixed that issue and the PettingZoo CI tests there are now passing. I did a search and found no uses of spaces.Sequence in Ray's repo so I think that should be fine, but I can do the pytests and try to find other potential issues with gymnasium 28.1.

We are finishing up a PettingZoo release to fix this gym 28.1, which we are hoping to get out by the end of this week. I can run the tests with the master branch of PettingZoo to ensure they pass and start debugging errors with them now, and then once the full release is out then I can change the requirements to list the new version of PettingZoo.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR. Still some minor fixes to do before we can merge this. See my comments.
Thanks again! :)

@elliottower
Copy link
Author

Awesome PR. Still some minor fixes to do before we can merge this. See my comments. Thanks again! :)

Could you help me diagnose the problems with the tests which are failing? It says 100 successful and 11 failing.

I'm running the pytest for pettingzoo locally to try and see if I can find any issue but can't see anything wrong, not sure why my change would affect things like ci-build-pr/java

@elliottower
Copy link
Author

Also I'm trying to install the repository locally and having issues, I ran pip install -e . in the /python/ directory, and it seemed to work at first but then had this error later

 Analyzing: 2 targets (49 packages loaded, 1520 targets configured)
    ERROR: /private/var/tmp/_bazel_elliottower/fd84858c11d357b636f50f21ff877190/external/bazel_tools/platforms/BUILD:89:6: in alias rule @bazel_tools//platforms:windows: Constraints from @bazel_tools//platforms have been removed. Please use constraints from @platforms repository embedded in Bazel, or preferably declare dependency on https://github.com/bazelbuild/platforms. See https://github.com/bazelbuild/bazel/issues/8622 for details.
    ERROR: /private/var/tmp/_bazel_elliottower/fd84858c11d357b636f50f21ff877190/external/bazel_tools/platforms/BUILD:89:6: Analysis of target '@bazel_tools//platforms:windows' failed
    ERROR: /Users/elliottower/Documents/GitHub/ray/cpp/BUILD.bazel:6:10: While resolving toolchains for target //cpp:libray_api.so: invalid registered toolchain '@bazel_skylib//toolchains/unittest:cmd_toolchain':
    ERROR: Analysis of target '//cpp:ray_cpp_pkg' failed; build aborted:
    INFO: Elapsed time: 211.165s
    INFO: 0 processes.
    FAILED: Build did NOT complete successfully (55 packages loaded, 1741 targets configured)
    /Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    /Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "/Users/elliottower/Documents/GitHub/ray/python/setup.py", line 768, in <module>
        setuptools.setup(
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/__init__.py", line 87, in setup
        return distutils.core.setup(**attrs)
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
        return run_commands(dist)
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
        dist.run_commands()
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
        self.run_command(cmd)
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/dist.py", line 1208, in run_command
        super().run_command(command)
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/command/develop.py", line 114, in install_for_development
        self.run_command('build_ext')
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
        self.distribution.run_command(command)
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/dist.py", line 1208, in run_command
        super().run_command(command)
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "/Users/elliottower/Documents/GitHub/ray/python/setup.py", line 756, in run
        return pip_run(self)
      File "/Users/elliottower/Documents/GitHub/ray/python/setup.py", line 657, in pip_run
        build(True, BUILD_JAVA, True)
      File "/Users/elliottower/Documents/GitHub/ray/python/setup.py", line 605, in build
        return bazel_invoke(
      File "/Users/elliottower/Documents/GitHub/ray/python/setup.py", line 372, in bazel_invoke
        result = invoker([cmd] + cmdline, *args, **kwargs)
      File "/Users/elliottower/anaconda3/envs/ray/lib/python3.8/subprocess.py", line 364, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['bazel', 'build', '--verbose_failures', '--', '//:ray_pkg', '//cpp:ray_cpp_pkg']' returned non-zero exit status 1.
    [end of output]

@elliottower
Copy link
Author

@sven1977 @avnishn any ideas about how to fix the failing tests? Re ran them and it’s the same ones failing but they don’t appear to be related to my changes as far as I can see at least

@sven1977
Copy link
Contributor

I think the only two tests that are of relevance are those that are failing with this error here:

Traceback (most recent call last):

ImportError:
 cannot import name 'BaseParallelWraper' from 
'pettingzoo.utils.wrappers' 
(/opt/miniconda/lib/python3.7/site-packages/pettingzoo/utils/wrappers/__init__.py)

It does seem like a supersuit/pettingzoo mismatch issue. ??

@elliottower
Copy link
Author

I think the only two tests that are of relevance are those that are failing with this error here:

Traceback (most recent call last):

ImportError:
 cannot import name 'BaseParallelWraper' from 
'pettingzoo.utils.wrappers' 
(/opt/miniconda/lib/python3.7/site-packages/pettingzoo/utils/wrappers/__init__.py)

It does seem like a supersuit/pettingzoo mismatch issue. ??

Ah yeah this was a typo which was fixed in the most recent pettingzoo update, thanks for pointing it out. I’ll work on getting those issues fixed. Going to wait to finalize this until we release the next major release though.

@sven1977 sven1977 changed the title Remove return_info from reset() in pettingzoo_env.py [RLlib] Remove return_info from reset() in pettingzoo_env.py. Mar 30, 2023
jianoaix and others added 4 commits April 22, 2023 18:34
* Revert "[Datasets] Revert "Enable streaming executor by default (ray-project#32493)" (ray-project#33485)"

This reverts commit 5c79954.

* Fix tensorarray to numpy conversion

Signed-off-by: elliottower <[email protected]>
…g executor (ray-project#34120)

* Revert "[Datasets] Revert "Enable streaming executor by default (ray-project#32493)" (ray-project#33485)"

This reverts commit 5c79954.

* Fix test failure caused by lack of ordring in default streaming executor

Signed-off-by: elliottower <[email protected]>
cluster_tune_scale_up_down
long_running_horovod_tune_test

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: elliottower <[email protected]>
Even though we have perf regression on v2 stack but at least they can run. Currently starting 65 nodes has very low success rate on v1 stack.

Signed-off-by: elliottower <[email protected]>
@elliottower
Copy link
Author

Going to do a new PR because this one seems to have gotten messed up from a re-signing issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RLlib] ParallelPettingZooEnv TypeError: reset() got an unexpected keyword argument 'return_info'