Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] IMPALA on new API stack (w/ EnvRunner- and ConnectorV2 APIs). #42085

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Dec 22, 2023

IMPALA on new API stack (w/ EnvRunner- and ConnectorV2 APIs).

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
-- no more context objects
-- calls take rl_module
-- ctors take obs- and action spaces (and maybe env)
-- ctors do NOT take RLModule anymore
-- env-to-module and learner connectors get constructed before(!) RLModule
-- module-to-env connector gets constructed after(!) RLModule
- StatelessCartPole still learning as well as before (see previous PRs)

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…ner and ConnectorV2.

New stack Atari
- Pong deterministic + no frameskip + reduced action space
- w/ EnvRunner and ConnectorV2s
- 8 GPUs (4000/256 batch/minibatch per Learner)
- SGD iters 10
- model: "vf_share_layers": True, "conv_filters": [[16, 4, 2], [32, 4, 2], [64, 4, 2], [128, 4, 2]], "conv_activation": "relu", "post_fcnet_hiddens": [256]
- 59 rollout workers
- 1 env per worker
- other training settings: lambda_=0.95, kl_coeff=0.5, clip_param=0.1, vf_clip_param=10.0, entropy_coeff=0.01, grad_clip=100.0, grad_clip_by="global_norm"

-----------------------------------
w/ ONLY 1 GPU (to compare to old stack)
LR: 0.0005
num_rollout_workers: 95 (from 59)
num_sgd_iter: 10 (back to original)
actual RLlib Atari wrappers (grayscale, frameskip, episodic life were missing!)

PRETTY DECENT!

Trial status: 1 RUNNING
Current time: 2023-12-13 07:08:15. Total running time: 16min 1s
Logical resource usage: 96.0/96 CPUs, 1.0/8 GPUs (0.0/1.0 accelerator_type:V100)
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name            status       iter     total time (s)       ts     reward     episode_reward_max     episode_reward_min     episode_len_mean     episodes_this_iter │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PPO_env_33a1e_00000   RUNNING       239            885.251   956000      19.39                     21                     12                 1762                      3 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-----------------------------------

w/ 8 GPUs again
LR: 0.002
num_rollout_workers: 95 (from 59)
num_sgd_iter: 10 (back to original)
actual RLlib Atari wrappers (grayscale, frameskip, episodic life were missing!)

Trial status: 1 RUNNING
Current time: 2023-12-13 07:16:35. Total running time: 7min 30s
Logical resource usage: 96.0/96 CPUs, 8.0/8 GPUs (0.0/1.0 accelerator_type:V100)
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name            status       iter     total time (s)        ts     reward     episode_reward_max     episode_reward_min     episode_len_mean     episodes_this_iter │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PPO_env_8dc32_00000   RUNNING        46            369.115   1472000       19.8                     21                     15              1738.74                     25 │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
----------------------------------

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…nfig' into replace_learner_hps_with_algo_config
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…nfig' into replace_learner_hps_with_algo_config
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_on_new_api_stack_w_env_runner_and_connectorv2
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_on_new_api_stack_w_env_runner_and_connectorv2
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_on_new_api_stack_w_env_runner_and_connectorv2
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 enabled auto-merge (squash) June 19, 2024 07:50
Signed-off-by: sven1977 <[email protected]>
@github-actions github-actions bot disabled auto-merge June 19, 2024 07:56
@sven1977 sven1977 enabled auto-merge (squash) June 19, 2024 08:07
@sven1977 sven1977 merged commit 231a013 into ray-project:master Jun 19, 2024
7 checks passed
@sven1977 sven1977 deleted the appo_on_new_api_stack_w_env_runner_and_connectorv2 branch June 19, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Do not merge this PR! go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-connectorv2 Connector related issues rllib-newstack
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants