You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
Ray Component
RLlib
What happened + What you expected to happen
Running an APPO training using use_kl_loss: True produces some errors.
Running a modified version of the APPO pendulum tuned example (by adding use_kl_loss: True):
ray.exceptions.RayTaskError(KeyError): ray::APPO.train_buffered() (pid=3768613, ip=192.168.1.74, repr=APPO)
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/tune/trainable.py", line 255, in train_buffered
result = self.train()
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/tune/trainable.py", line 314, in train
result = self.step()
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 885, in step
raise e
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 867, in step
result = self.step_attempt()
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 920, in step_attempt
step_results = next(self.train_exec_impl)
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 756, in __next__
return next(self.built_iterator)
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 843, in apply_filter
for item in it:
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 843, in apply_filter
for item in it:
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/util/iter.py", line 791, in apply_foreach
result = fn(item)
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/ppo/appo.py", line 107, in __call__
self.update_kl(fetches)
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/ppo/ppo.py", line 214, in __call__
self.workers.local_worker().foreach_trainable_policy(update)
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1256, in foreach_trainable_policy
return [
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1257, in <listcomp>
func(policy, pid, **kwargs)
File "/home/vince/.pyenv/versions/3.9.7/envs/spg-exp/lib/python3.9/site-packages/ray/rllib/agents/ppo/ppo.py", line 205, in update
kl = fetches[pi_id][LEARNER_STATS_KEY].get("kl")
KeyError: 'learner_stats'
My code produces the stack trace below. I tried to make a minimal reproduction, but then I run into the error mentioned above, so I stopped (but I can dig into this if needed). My code uses Tune to run the experiments, maybe that causes the difference. I added the print statements to rllib/utils/metrics/learner_info.py to see what triggers the crash.
I think this is related to the issue above, but I'm not sure.
Stacktrace
(APPO pid=3782547) #########
(APPO pid=3782547) ('learner_stats', 'KL_Coeff')
(APPO pid=3782547) #########
(APPO pid=3782547) ('learner_stats', 'cur_lr')
(APPO pid=3782547) #########
(APPO pid=3782547) ('learner_stats', 'entropy')
(APPO pid=3782547) #########
(APPO pid=3782547) ('learner_stats', 'entropy_coeff')
(APPO pid=3782547) #########
(APPO pid=3782547) ('learner_stats', 'kl', 0)
(APPO pid=3782547) Exception in thread Thread-1:
(APPO pid=3782547) Traceback (most recent call last):
(APPO pid=3782547) File "/home/vince/.pyenv/versions/3.9.7/lib/python3.9/threading.py", line 973, in _bootstrap_inner
(APPO pid=3782547) self.run()
(APPO pid=3782547) File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/execution/learner_thread.py", line 69, in run
(APPO pid=3782547) self.step()
(APPO pid=3782547) File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/execution/multi_gpu_learner_thread.py", line 162, in step
(APPO pid=3782547) learner_info_builder.add_learn_on_batch_results(
(APPO pid=3782547) File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/utils/metrics/learner_info.py", line 44, in add_learn_on_batch_results
(APPO pid=3782547) tree.map_structure_with_path(
(APPO pid=3782547) File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/tree/__init__.py", line 549, in map_structure_with_path
(APPO pid=3782547) return map_structure_with_path_up_to(structures[0], func, *structures,
(APPO pid=3782547) File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/tree/__init__.py", line 852, in map_structure_with_path_up_to
(APPO pid=3782547) [func(*args) for args in zip(flat_path_list, *flat_value_lists)])
(APPO pid=3782547) File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/tree/__init__.py", line 852, in <listcomp>
(APPO pid=3782547) [func(*args) for args in zip(flat_path_list, *flat_value_lists)])
(APPO pid=3782547) File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/utils/metrics/learner_info.py", line 45, in <lambda>
(APPO pid=3782547) lambda p, *s: all_tower_reduce(p, *s),
(APPO pid=3782547) File "/home/vince/.pyenv/versions/spg-exp/lib/python3.9/site-packages/ray/rllib/utils/metrics/learner_info.py", line 80, in all_tower_reduce
(APPO pid=3782547) if path[-1].startswith("min_"):
(APPO pid=3782547) AttributeError: 'int' object has no attribute 'startswith'
Versions / Dependencies
Python: 3.9.7
Ray: 1.9.0
Reproduction script
See above.
Anything else
No response
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
vakker
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jan 2, 2022
clarkzinzow
changed the title
[Bug] APPO with kl_loss learner_stats bug
[RLlib] [Bug] APPO with kl_loss learner_stats bug
Jan 4, 2022
Search before asking
Ray Component
RLlib
What happened + What you expected to happen
Running an APPO training using
use_kl_loss: True
produces some errors.Running a modified version of the APPO pendulum tuned example (by adding
use_kl_loss: True
):throws:
Stacktrace
My code produces the stack trace below. I tried to make a minimal reproduction, but then I run into the error mentioned above, so I stopped (but I can dig into this if needed). My code uses Tune to run the experiments, maybe that causes the difference. I added the print statements to
rllib/utils/metrics/learner_info.py
to see what triggers the crash.I think this is related to the issue above, but I'm not sure.
Stacktrace
Versions / Dependencies
Python: 3.9.7
Ray: 1.9.0
Reproduction script
See above.
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: