[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. #21855

gjoliver · 2022-01-25T08:54:14Z

Why are these changes needed?

Fix APPO agent when kl_loss is enabled.
value is now saved under different policy id keys. also we need to torch_mean() the stats for the torch policy.

Related issue number

Closes #21334

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [*] Unit tests
- Release tests
- This PR is not tested :(

Bug is with learner_info construction in our LearnerThread. This only shows up for APPO because APPO, IMPALA, and APEX are the only ones that use async Learnthread, while APPO is the only agent that updates kl loss.

sven1977 · 2022-01-26T20:29:11Z

rllib/agents/ppo/tests/test_appo.py

@@ -47,6 +47,25 @@ def test_appo_compilation(self):
            check_compute_single_action(trainer)
            trainer.stop()

+    def test_appo_compilation_use_kl_loss(self):
+        """Test whether an APPOTrainer can be built with both frameworks."""


Nit: Fix the comment?

sven1977 · 2022-01-26T20:29:50Z

rllib/agents/ppo/tests/test_appo.py

+        num_iterations = 2
+
+        for _ in framework_iterator(config, with_eager_tracing=True):
+            print("w/ v-trace")


Not necessary here, no?

right, I got rid of it.

sven1977 · 2022-01-26T20:29:56Z

rllib/agents/ppo/tests/test_appo.py

+        for _ in framework_iterator(config, with_eager_tracing=True):
+            print("w/ v-trace")
+            _config = config.copy()
+            _config["vtrace"] = True


sven1977 · 2022-01-26T20:30:52Z

rllib/execution/multi_gpu_learner_thread.py


        if released:
            self.idle_tower_stacks.put(buffer_idx)

-        self.outqueue.put((get_num_samples_loaded_into_buffer, learner_stats))
+        self.outqueue.put((get_num_samples_loaded_into_buffer,


sven1977

Very cool! Thanks for the fix.
Is there an issue related to this PR? Could you change the title to: [RLlib] Issue xyz: ...

sven1977 · 2022-01-26T20:32:34Z

Sorry, saw the issue # now.

gjoliver requested review from avnishn and sven1977 as code owners January 25, 2022 08:54

gjoliver force-pushed the fix_kl_loss branch from 419123c to 90ba3e1 Compare January 26, 2022 06:42

Fix APPO when kl_loss is enabled.

3d6af12

Bug is with learner_info construction in our LearnerThread. This only shows up for APPO because APPO, IMPALA, and APEX are the only ones that use async Learnthread, while APPO is the only agent that updates kl loss.

gjoliver force-pushed the fix_kl_loss branch from 90ba3e1 to 3d6af12 Compare January 26, 2022 06:47

Jun Gong added 2 commits January 25, 2022 23:26

lint

d1a15a5

lint

3b2a44b

sven1977 self-assigned this Jan 26, 2022

sven1977 reviewed Jan 26, 2022

View reviewed changes

sven1977 approved these changes Jan 26, 2022

View reviewed changes

sven1977 changed the title ~~Fix APPO when kl_loss is enabled.~~ [RLlib] Issue 21334: Fix APPO when kl_loss is enabled. Jan 26, 2022

fix

fa3f458

sven1977 merged commit 8ebc50f into ray-project:master Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. #21855

[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. #21855

gjoliver commented Jan 25, 2022 •

edited by sven1977

Loading

sven1977 Jan 26, 2022

gjoliver Jan 27, 2022

sven1977 Jan 26, 2022

gjoliver Jan 27, 2022

sven1977 Jan 26, 2022

gjoliver Jan 27, 2022

sven1977 Jan 26, 2022

sven1977 left a comment

sven1977 commented Jan 26, 2022

[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. #21855

[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. #21855

Conversation

gjoliver commented Jan 25, 2022 • edited by sven1977 Loading

Why are these changes needed?

Related issue number

Checks

sven1977 Jan 26, 2022

Choose a reason for hiding this comment

gjoliver Jan 27, 2022

Choose a reason for hiding this comment

sven1977 Jan 26, 2022

Choose a reason for hiding this comment

gjoliver Jan 27, 2022

Choose a reason for hiding this comment

sven1977 Jan 26, 2022

Choose a reason for hiding this comment

gjoliver Jan 27, 2022

Choose a reason for hiding this comment

sven1977 Jan 26, 2022

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

sven1977 commented Jan 26, 2022

gjoliver commented Jan 25, 2022 •

edited by sven1977

Loading