Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Policy learner_stats get dropped when multi_gpu_learner_thread.py is used (in GPU and multi-GPU use cases). #18116

Closed
1 of 2 tasks
Bam4d opened this issue Aug 26, 2021 · 4 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical rllib RLlib related issues

Comments

@Bam4d
Copy link
Contributor

Bam4d commented Aug 26, 2021

What is the problem?

When multiple gpus are used, learner stats are gathered in the learn_on_loaded_batch method with a "tower_X" key before the stats:
https://github.com/ray-project/ray/blob/master/rllib/policy/torch_policy.py#L645

for example:

{'tower_0': {'learner_stats': {'cur_lr': 0.000495184, 'policy_loss': -40.517921447753906, 'entropy': 1.8380180597305298, 'entropy_coeff': 0.0005, 'var_gnorm': 17.741676330566406, 'vf_loss': 0.7066917419433594, 'vf_explained_var': array([0.5089742], dtype=float32), 'mean_rhos': 1.0025060176849365, 'std_rhos': 0.39850014448165894}}}

This 'tower_0' is not taken into account when get_learner_stats() is used:

self.stats = {DEFAULT_POLICY_ID: get_learner_stats(fetches)}

This causes the policy learner_stats to get dropped when GPUs are used.

CPU does not drop these stats

This is different from when a single gpu/cpu is used, the learn_on_loaded_batch function will return:

{'learner_stats': {'cur_lr': 0.000495184, 'policy_loss': -40.517921447753906, 'entropy': 1.8380180597305298, 'entropy_coeff': 0.0005, 'var_gnorm': 17.741676330566406, 'vf_loss': 0.7066917419433594, 'vf_explained_var': array([0.5089742], dtype=float32), 'mean_rhos': 1.0025060176849365, 'std_rhos': 0.39850014448165894}

(note the lack of tower_X key)

Similar code can then extract the policy metrics which works!

self.stats = get_learner_stats(fetches)

Ray version and other system information (Python version, TensorFlow version, OS):

version: latest dev 2.0.0
python: 3.8
macosx + linux
torch + tensorflow

Reproduction (REQUIRED)

Run anything with GPU learners (specifically in my case I'm using IMPALA)

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

@sven1977

@Bam4d Bam4d added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 26, 2021
@mvindiola1
Copy link
Contributor

mvindiola1 commented Sep 4, 2021

@Bam4d

Does flipping the keys fix this?

batch_fetches[LEARNER_STATS_KEY]={} 
for i, batch in enumerate(device_batches):
    batch_fetches[LEARNER_STATS_KEY] [f"tower_{i}"] = self.extra_grad_info(batch)

@sven1977 sven1977 self-assigned this Sep 24, 2021
@sven1977 sven1977 added P2 Important issue, but not time-critical rllib RLlib related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 24, 2021
@sven1977
Copy link
Contributor

Thanks for raising this issue. Great catch! We should add a check to all agent "compilation" tests to check the structure of the stats.

To solve this: I think we should rather fix the get_learner_stats() function to handle the multi-GPU case and then use that function in all execution ops. We can use the existing all-tower-reduce code in MultiGPUTrainOneStep and move that into get_learner_stats, then use get_learner_stats consistently everywhere.

@sven1977
Copy link
Contributor

@Bam4d @mvindiola1 ^

@sven1977
Copy link
Contributor

sven1977 commented Oct 5, 2021

Closing this issue. Please feel free to re-open it should there still be problems.
The above PR makes sure that all Trainer.train() returned results dict have the same structure (test for that added), regardless of the particular setup, like multi-GPU, tf/torch, multi-agent, num_sgd_iters>1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical rllib RLlib related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants