[RLlib] Example script: Simple league-based self-play w/ the open spiel markov soccer env. #17077

sven1977 · 2021-07-14T15:26:40Z

Example script: Simple league-based self-play w/ the open spiel a) markov soccer and b) connect-4 envs.

The script shows how to set up a simple league consisting of 3 types of policies, similar to DeepMind's StarCraft II setup:

main policies (the one we would like to use for inference in the end)
main exploiters (always playing against main and its past versions)
league exploiters (always playing against any other member of the league, including other league exploiters)

Starting with a single "main" and two randomly initialized league- & main-exploiters, the league is further augmented during training. This is achieved by measuring each policy's win-rate and then cloning a policy when it has reached a certain win-rate. The new clone is either frozen or keeps being trained (randomly chose between these two options).

In the end, we'll have a main policy that robustly plays against different opponent strategies and against catastrophic forgetting.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…cy_map_as_lru_cache # Conflicts: # rllib/policy/tf_policy.py

…nto policy_map_as_lru_cache

…cy_map_as_lru_cache

…cy_map_as_lru_cache # Conflicts: # rllib/evaluation/rollout_worker.py

…cy_map_as_lru_cache

…eague_based_example

…cy_map_as_lru_cache_league_based_example

sven1977 added 28 commits July 8, 2021 17:31

wip.

b8c516f

Merge branch 'master' of https://github.com/ray-project/ray into poli…

3bd9a85

…cy_map_as_lru_cache # Conflicts: # rllib/policy/tf_policy.py

t statusMerge branch 'master' of https://github.com/ray-project/ray i…

42b3bd9

…nto policy_map_as_lru_cache

wip.

fb488de

wip.

abdc1a1

merge.

adebf34

merge.

a7ea933

LINT.

6521bf1

merge

c646d26

Merge branch 'master' of https://github.com/ray-project/ray into poli…

afe1af0

…cy_map_as_lru_cache

merge

1b4f7dd

wip.

a7e3e27

wip.

4de0721

wip.

669f86e

LINT.

1a5a01a

Merge branch 'master' of https://github.com/ray-project/ray into poli…

6ae7c4a

…cy_map_as_lru_cache # Conflicts: # rllib/evaluation/rollout_worker.py

wip.

05d596d

wip.

d681320

Merge branch 'master' of https://github.com/ray-project/ray into poli…

faa12df

…cy_map_as_lru_cache

fixes

c51708f

fixes and LINT.

17674f3

Merge branch 'policy_map_as_lru_cache' into policy_map_as_lru_cache_l…

ffa2dbd

…eague_based_example

Merge branch 'master' of https://github.com/ray-project/ray into poli…

69e4614

…cy_map_as_lru_cache_league_based_example

wip.

00fc904

Merge branch 'master' of https://github.com/ray-project/ray into poli…

b51702e

…cy_map_as_lru_cache_league_based_example

wip

d99dd30

wip

5602274

Merge branch 'master' of https://github.com/ray-project/ray into poli…

0df83d2

…cy_map_as_lru_cache_league_based_example

sven1977 requested a review from michaelzhiluo July 20, 2021 15:02

sven1977 assigned michaelzhiluo Jul 20, 2021

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jul 20, 2021

fix

2870d35

sven1977 removed the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jul 20, 2021

sven1977 added 2 commits July 20, 2021 15:47

merge

5a4ef46

LINT.

aae0a34

michaelzhiluo approved these changes Jul 21, 2021

View reviewed changes

sven1977 added 5 commits July 21, 2021 15:04

wip

974bd44

wip

71b2c28

LINT.

a092ad7

Merge branch 'master' of https://github.com/ray-project/ray into poli…

040a242

…cy_map_as_lru_cache_league_based_example

fix

ccb8af8

sven1977 merged commit 7bc4376 into ray-project:master Jul 22, 2021

elliottower mentioned this pull request Feb 13, 2023

[RLlib] Added self-play example with pettingzo #32492

Closed

7 tasks

elliottower mentioned this pull request Mar 20, 2023

[RLlib] Add connect four self-play example with pettingzo #33481

Closed

7 tasks

sven1977 deleted the policy_map_as_lru_cache_league_based_example branch June 2, 2023 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Example script: Simple league-based self-play w/ the open spiel markov soccer env. #17077

[RLlib] Example script: Simple league-based self-play w/ the open spiel markov soccer env. #17077

sven1977 commented Jul 14, 2021 •

edited

Loading

[RLlib] Example script: Simple league-based self-play w/ the open spiel markov soccer env. #17077

[RLlib] Example script: Simple league-based self-play w/ the open spiel markov soccer env. #17077

Conversation

sven1977 commented Jul 14, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 commented Jul 14, 2021 •

edited

Loading