Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Example script: Simple league-based self-play w/ the open spiel markov soccer env. #17077

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jul 14, 2021

Example script: Simple league-based self-play w/ the open spiel a) markov soccer and b) connect-4 envs.

The script shows how to set up a simple league consisting of 3 types of policies, similar to DeepMind's StarCraft II setup:

  • main policies (the one we would like to use for inference in the end)
  • main exploiters (always playing against main and its past versions)
  • league exploiters (always playing against any other member of the league, including other league exploiters)

Starting with a single "main" and two randomly initialized league- & main-exploiters, the league is further augmented during training. This is achieved by measuring each policy's win-rate and then cloning a policy when it has reached a certain win-rate. The new clone is either frozen or keeps being trained (randomly chose between these two options).

In the end, we'll have a main policy that robustly plays against different opponent strategies and against catastrophic forgetting.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

…cy_map_as_lru_cache

# Conflicts:
#	rllib/policy/tf_policy.py
…cy_map_as_lru_cache

# Conflicts:
#	rllib/evaluation/rollout_worker.py
@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jul 20, 2021
@sven1977 sven1977 removed the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jul 20, 2021
@sven1977 sven1977 merged commit 7bc4376 into ray-project:master Jul 22, 2021
@sven1977 sven1977 deleted the policy_map_as_lru_cache_league_based_example branch June 2, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants