Hindsight Experience Replay #6

prabhatnagarajan · 2020-06-25T17:40:58Z

Hindsight Experience Replay with bit-flipping example: https://arxiv.org/abs/1707.01495

peasant98 · 2020-08-30T17:35:02Z

Hi,

What's the current status of this?

prabhatnagarajan · 2020-08-30T17:42:28Z

I'm currently working on it (on-and-off) on the following branch of my personal fork: https://github.com/prabhatnagarajan/pfrl/tree/her. I'm planning on applying HER to the bit-flip environment from the original paper that introduced HER. I'm fairly confident the Hindsight Experience Replay implementation is good, as we've used a variant of it for other projects successfully. However, currently my performance on the bit-flip environment is poor and requires investigation.

peasant98 · 2020-08-30T17:55:59Z

Ah cool, thanks for the update.

abagaria · 2021-08-26T19:33:35Z

HER requires that we make updates to the agent's policy+Q-function at the end of the episode. But, PFRL assumes that an agent.act(s) is followed by an agent.observe(s', r) (as evidenced by their use of batched_last_action to keep track of actions). How are you going to deal with that?

prabhatnagarajan · 2021-08-27T20:22:46Z

Note that the HindsightReplayBuffer extends the EpisodicReplayBuffer. If you see the data structures within the EpisodicReplayBuffer, you can see that the episodic buffer maintains a current_episode which is only appended to the larger replay buffer when an episode is stopped. This ensures that when we perform updates, we're not using incomplete episodes.

About the use of batch_last_action, I'm not entirely sure what you're asking. If you see this function , we're using batch_last_action, yes, but it's being added to the replay buffer, not being used for updates. At the end of the function we call self.replay_updater.update_if_necessary(self.t) which will perform a gradient update, but it will not use batch_last_action.

Does this answer your question? If not, feel free to clarify and I'll do my best to answer.

prabhatnagarajan self-assigned this Jun 25, 2020

prabhatnagarajan linked a pull request Oct 28, 2020 that will close this issue

Hindsight Experience Replay Buffer #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hindsight Experience Replay #6

Hindsight Experience Replay #6

prabhatnagarajan commented Jun 25, 2020

peasant98 commented Aug 30, 2020

prabhatnagarajan commented Aug 30, 2020

peasant98 commented Aug 30, 2020

abagaria commented Aug 26, 2021

prabhatnagarajan commented Aug 27, 2021

Hindsight Experience Replay #6

Hindsight Experience Replay #6

Comments

prabhatnagarajan commented Jun 25, 2020

peasant98 commented Aug 30, 2020

prabhatnagarajan commented Aug 30, 2020

peasant98 commented Aug 30, 2020

abagaria commented Aug 26, 2021

prabhatnagarajan commented Aug 27, 2021