State available in SampleBatch and ReplayBuffer #43

Edilmo · 2020-10-01T16:23:47Z

Why are these changes needed?

Currently, for recurrent/recursive models, the state is only available for policy evaluation during training, but it's not available in the SampleBatch hence is not accessible at the execution plans level which in turn means that is not present in the replay buffer. So, apex-like algorithms can not use memory models right now in RLlib.

Here we are making the very first step towards supporting memory for this kind of algorithms.

Related issue number

None

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested (please justify below)

Edilmo requested review from Random-Word and RuofanKong October 5, 2020 18:43

Edilmo added 3 commits October 5, 2020 20:43

Making states available in SampleBatch and ReplayBuffer

726ba06

RNN support in DQN

a08fc78

RNN support in DDPG and SAC

5d19589

Edilmo force-pushed the edpalenc/replay-state branch from 962ad27 to 5d19589 Compare October 6, 2020 03:45

Edilmo added 3 commits October 6, 2020 01:18

Fixing missing changes

a09e815

Fixing DDPG and SAC

8ca1913

Fixing eager mode

e39c326

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State available in SampleBatch and ReplayBuffer #43

State available in SampleBatch and ReplayBuffer #43

Edilmo commented Oct 1, 2020 •

edited

Loading

State available in SampleBatch and ReplayBuffer #43

Are you sure you want to change the base?

State available in SampleBatch and ReplayBuffer #43

Conversation

Edilmo commented Oct 1, 2020 • edited Loading

Why are these changes needed?

Related issue number

Checks

Edilmo commented Oct 1, 2020 •

edited

Loading