[RLlib] IMPALA on new API stack (w/ EnvRunner- and ConnectorV2 APIs). #42085

sven1977 · 2023-12-22T10:46:39Z

IMPALA on new API stack (w/ EnvRunner- and ConnectorV2 APIs).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

-- no more context objects -- calls take rl_module -- ctors take obs- and action spaces (and maybe env) -- ctors do NOT take RLModule anymore -- env-to-module and learner connectors get constructed before(!) RLModule -- module-to-env connector gets constructed after(!) RLModule - StatelessCartPole still learning as well as before (see previous PRs) Signed-off-by: sven1977 <[email protected]>

Signed-off-by: sven1977 <[email protected]>

…ner and ConnectorV2. New stack Atari - Pong deterministic + no frameskip + reduced action space - w/ EnvRunner and ConnectorV2s - 8 GPUs (4000/256 batch/minibatch per Learner) - SGD iters 10 - model: "vf_share_layers": True, "conv_filters": [[16, 4, 2], [32, 4, 2], [64, 4, 2], [128, 4, 2]], "conv_activation": "relu", "post_fcnet_hiddens": [256] - 59 rollout workers - 1 env per worker - other training settings: lambda_=0.95, kl_coeff=0.5, clip_param=0.1, vf_clip_param=10.0, entropy_coeff=0.01, grad_clip=100.0, grad_clip_by="global_norm" ----------------------------------- w/ ONLY 1 GPU (to compare to old stack) LR: 0.0005 num_rollout_workers: 95 (from 59) num_sgd_iter: 10 (back to original) actual RLlib Atari wrappers (grayscale, frameskip, episodic life were missing!) PRETTY DECENT! Trial status: 1 RUNNING Current time: 2023-12-13 07:08:15. Total running time: 16min 1s Logical resource usage: 96.0/96 CPUs, 1.0/8 GPUs (0.0/1.0 accelerator_type:V100) ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Trial name status iter total time (s) ts reward episode_reward_max episode_reward_min episode_len_mean episodes_this_iter │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PPO_env_33a1e_00000 RUNNING 239 885.251 956000 19.39 21 12 1762 3 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ----------------------------------- w/ 8 GPUs again LR: 0.002 num_rollout_workers: 95 (from 59) num_sgd_iter: 10 (back to original) actual RLlib Atari wrappers (grayscale, frameskip, episodic life were missing!) Trial status: 1 RUNNING Current time: 2023-12-13 07:16:35. Total running time: 7min 30s Logical resource usage: 96.0/96 CPUs, 8.0/8 GPUs (0.0/1.0 accelerator_type:V100) ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Trial name status iter total time (s) ts reward episode_reward_max episode_reward_min episode_len_mean episodes_this_iter │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PPO_env_8dc32_00000 RUNNING 46 369.115 1472000 19.8 21 15 1738.74 25 │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ---------------------------------- Signed-off-by: sven1977 <[email protected]>

…ace_learner_hps_with_algo_config

Signed-off-by: sven1977 <[email protected]>

…nfig' into replace_learner_hps_with_algo_config

…runner_support_connectors

Signed-off-by: sven1977 <[email protected]>

…ace_learner_hps_with_algo_config

Signed-off-by: sven1977 <[email protected]>

…runner_support_connectors

Signed-off-by: sven1977 <[email protected]>

…nfig' into replace_learner_hps_with_algo_config

Signed-off-by: sven1977 <[email protected]>

…ace_learner_hps_with_algo_config

Signed-off-by: sven1977 <[email protected]>

…ace_learner_hps_with_algo_config

Signed-off-by: sven1977 <[email protected]>

…_on_new_api_stack_w_env_runner_and_connectorv2

Signed-off-by: sven1977 <[email protected]>

…_on_new_api_stack_w_env_runner_and_connectorv2

Signed-off-by: sven1977 <[email protected]>

…_on_new_api_stack_w_env_runner_and_connectorv2

Signed-off-by: sven1977 <[email protected]>

sven1977 added 30 commits December 11, 2023 13:45

wip

e34c1ff

Signed-off-by: sven1977 <[email protected]>

wip

ccdd4e3

Signed-off-by: sven1977 <[email protected]>

merge

d91576c

Signed-off-by: sven1977 <[email protected]>

multi-GPU torch DDP fix

72620f9

Signed-off-by: sven1977 <[email protected]>

wip

d3a40ea

Signed-off-by: sven1977 <[email protected]>

wip

d5e2150

Signed-off-by: sven1977 <[email protected]>

wip

cea44b9

Signed-off-by: sven1977 <[email protected]>

wip

45dede9

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' into replace_learner_hps_with_algo_config

3c335aa

Merge branch 'master' of https://github.com/ray-project/ray into repl…

0452070

…ace_learner_hps_with_algo_config

wip

0213870

Signed-off-by: sven1977 <[email protected]>

Merge remote-tracking branch 'origin/replace_learner_hps_with_algo_co…

728bdec

…nfig' into replace_learner_hps_with_algo_config

Merge branch 'master' of https://github.com/ray-project/ray into env_…

a15bd29

…runner_support_connectors

wip

5509640

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into repl…

31a6e5c

…ace_learner_hps_with_algo_config

wip

765f252

Signed-off-by: sven1977 <[email protected]>

wip

4da7b8c

Signed-off-by: sven1977 <[email protected]>

LINT

6b7978a

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

0a5380e

…runner_support_connectors

wip

8fc5056

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' into replace_learner_hps_with_algo_config

d88fea8

wip

9e5ce8f

Signed-off-by: sven1977 <[email protected]>

Merge remote-tracking branch 'origin/replace_learner_hps_with_algo_co…

7e081ef

…nfig' into replace_learner_hps_with_algo_config

LINT

f942698

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into repl…

9bfbef6

…ace_learner_hps_with_algo_config

wip

c2317cc

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into repl…

5b9556d

…ace_learner_hps_with_algo_config

fix

0286340

Signed-off-by: sven1977 <[email protected]>

sven1977 added 23 commits June 14, 2024 10:38

wip

31aaa8b

Signed-off-by: sven1977 <[email protected]>

wip

293b417

Signed-off-by: sven1977 <[email protected]>

wip

3fd3b52

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into appo…

03cdbc5

…_on_new_api_stack_w_env_runner_and_connectorv2

wip

6eb2547

Signed-off-by: sven1977 <[email protected]>

wip

3863383

Signed-off-by: sven1977 <[email protected]>

wip

097ea43

Signed-off-by: sven1977 <[email protected]>

wip

44224ce

Signed-off-by: sven1977 <[email protected]>

wip

1a3e054

Signed-off-by: sven1977 <[email protected]>

wip

6a417ff

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into appo…

f666e3a

…_on_new_api_stack_w_env_runner_and_connectorv2

wip

1cb2a32

Signed-off-by: sven1977 <[email protected]>

wip

67476ee

Signed-off-by: sven1977 <[email protected]>

wip

73ed1f9

Signed-off-by: sven1977 <[email protected]>

wip

5efd05e

Signed-off-by: sven1977 <[email protected]>

wip

36eaaf9

Signed-off-by: sven1977 <[email protected]>

LINT

1b2a24b

Signed-off-by: sven1977 <[email protected]>

wip

20007eb

Signed-off-by: sven1977 <[email protected]>

wip

91e16f9

Signed-off-by: sven1977 <[email protected]>

wip

b799260

Signed-off-by: sven1977 <[email protected]>

wip

954d38e

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into appo…

499e4ea

…_on_new_api_stack_w_env_runner_and_connectorv2

wip

fb48956

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) June 19, 2024 07:50

wip

1eb7205

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge June 19, 2024 07:56

sven1977 enabled auto-merge (squash) June 19, 2024 08:07

sven1977 merged commit 231a013 into ray-project:master Jun 19, 2024
7 checks passed

sven1977 deleted the appo_on_new_api_stack_w_env_runner_and_connectorv2 branch June 19, 2024 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] IMPALA on new API stack (w/ EnvRunner- and ConnectorV2 APIs). #42085

[RLlib] IMPALA on new API stack (w/ EnvRunner- and ConnectorV2 APIs). #42085

sven1977 commented Dec 22, 2023 •

edited

Loading

[RLlib] IMPALA on new API stack (w/ EnvRunner- and ConnectorV2 APIs). #42085

[RLlib] IMPALA on new API stack (w/ EnvRunner- and ConnectorV2 APIs). #42085

Conversation

sven1977 commented Dec 22, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 commented Dec 22, 2023 •

edited

Loading