[RLlib] [Bug] APPO with kl_loss learner_stats bug #21334

vakker · 2022-01-02T22:12:02Z

Search before asking

I searched the issues and found no similar issues.

Ray Component

RLlib

What happened + What you expected to happen

Running an APPO training using use_kl_loss: True produces some errors.
Running a modified version of the APPO pendulum tuned example (by adding use_kl_loss: True):

rllib train -f rllib/tuned_examples/ppo/pendulum-appo.yaml

throws:

Stacktrace

ray.exceptions.RayTaskError(KeyError): ray::APPO.train_buffered() (pid=3768613, ip=192.168.1.74, repr=APPO)
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/tune/trainable.py", line 255, in train_buffered
    result = self.train()                                                                                
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/tune/trainable.py", line 314, in train
    result = self.step()                                                                                                                                                                                           
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 885, in step
    raise e                                                                                                                                                                                                        
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 867, in step
    result = self.step_attempt()                                                                                                                                                                                   
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 920, in step_attempt
    step_results = next(self.train_exec_impl)                                                                                                                                                                      
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 756, in __next__
    return next(self.built_iterator)                                                                     
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 791, in apply_foreach
    result = fn(item)
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/ppo/appo.py", line 107, in __call__
    self.update_kl(fetches)
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/ppo/ppo.py", line 214, in __call__
    self.workers.local_worker().foreach_trainable_policy(update)
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1256, in foreach_trainable_policy
    return [
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1257, in <listcomp>
    func(policy, pid, **kwargs)
  File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/ppo/ppo.py", line 205, in update
    kl = fetches[pi_id][LEARNER_STATS_KEY].get("kl") 
KeyError: 'learner_stats'

My code produces the stack trace below. I tried to make a minimal reproduction, but then I run into the error mentioned above, so I stopped (but I can dig into this if needed). My code uses Tune to run the experiments, maybe that causes the difference. I added the print statements to rllib/utils/metrics/learner_info.py to see what triggers the crash.
I think this is related to the issue above, but I'm not sure.

Stacktrace

(APPO pid=3782547) #########                                                                                                                                                                                       
(APPO pid=3782547) ('learner_stats', 'KL_Coeff')                                                                                                                                                                   
(APPO pid=3782547) #########                                                                                                                                                                                       
(APPO pid=3782547) ('learner_stats', 'cur_lr')                                                                                                                                                                     
(APPO pid=3782547) #########                                                                                                                                                                                       
(APPO pid=3782547) ('learner_stats', 'entropy')                                                                                                                                                                    
(APPO pid=3782547) #########                                                                                                                                                                                       
(APPO pid=3782547) ('learner_stats', 'entropy_coeff')                                                                                                                                                              
(APPO pid=3782547) #########                                                                                                                                                                                       
(APPO pid=3782547) ('learner_stats', 'kl', 0)                                                                                                                                                                      
(APPO pid=3782547) Exception in thread Thread-1:                                                                                                                                                                   
(APPO pid=3782547) Traceback (most recent call last):                                                                                                                                                              
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/3.9.7/lib/python3.9/threading.py", line 973, in _bootstrap_inner
(APPO pid=3782547)     self.run()
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/execution/learner_thread.py", line 69, in run
(APPO pid=3782547)     self.step()
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/execution/multi_gpu_learner_thread.py", line 162, in step
(APPO pid=3782547)     learner_info_builder.add_learn_on_batch_results(
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/utils/metrics/learner_info.py", line 44, in add_learn_on_batch_results
(APPO pid=3782547)     tree.map_structure_with_path(
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/tree/__init__.py", line 549, in map_structure_with_path
(APPO pid=3782547)     return map_structure_with_path_up_to(structures[0], func, *structures,
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/tree/__init__.py", line 852, in map_structure_with_path_up_to
(APPO pid=3782547)     [func(*args) for args in zip(flat_path_list, *flat_value_lists)])
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/tree/__init__.py", line 852, in <listcomp>
(APPO pid=3782547)     [func(*args) for args in zip(flat_path_list, *flat_value_lists)])
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/utils/metrics/learner_info.py", line 45, in <lambda>
(APPO pid=3782547)     lambda p, *s: all_tower_reduce(p, *s),
(APPO pid=3782547)   File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/utils/metrics/learner_info.py", line 80, in all_tower_reduce
(APPO pid=3782547)     if path[-1].startswith("min_"):
(APPO pid=3782547) AttributeError: 'int' object has no attribute 'startswith'

Versions / Dependencies

Python: 3.9.7
Ray: 1.9.0

Reproduction script

See above.

Anything else

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

vakker added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 2, 2022

clarkzinzow changed the title ~~[Bug] APPO with kl_loss learner_stats bug~~ [RLlib] [Bug] APPO with kl_loss learner_stats bug Jan 4, 2022

clarkzinzow added the rllib RLlib related issues label Jan 4, 2022

gjoliver removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label Jan 25, 2022

gjoliver self-assigned this Jan 25, 2022

gjoliver mentioned this issue Jan 25, 2022

[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. #21855

Merged

5 tasks

sven1977 closed this as completed in #21855 Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] [Bug] APPO with kl_loss learner_stats bug #21334

[RLlib] [Bug] APPO with kl_loss learner_stats bug #21334

vakker commented Jan 2, 2022

[RLlib] [Bug] APPO with kl_loss learner_stats bug #21334

[RLlib] [Bug] APPO with kl_loss learner_stats bug #21334

Comments

vakker commented Jan 2, 2022

Search before asking

Ray Component

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Anything else

Are you willing to submit a PR?