Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hindsight Experience Replay #6

Open
prabhatnagarajan opened this issue Jun 25, 2020 · 5 comments · May be fixed by #84
Open

Hindsight Experience Replay #6

prabhatnagarajan opened this issue Jun 25, 2020 · 5 comments · May be fixed by #84
Assignees

Comments

@prabhatnagarajan
Copy link
Contributor

Hindsight Experience Replay with bit-flipping example: https://arxiv.org/abs/1707.01495

@prabhatnagarajan prabhatnagarajan self-assigned this Jun 25, 2020
@peasant98
Copy link

Hi,

What's the current status of this?

@prabhatnagarajan
Copy link
Contributor Author

I'm currently working on it (on-and-off) on the following branch of my personal fork: https://github.com/prabhatnagarajan/pfrl/tree/her. I'm planning on applying HER to the bit-flip environment from the original paper that introduced HER. I'm fairly confident the Hindsight Experience Replay implementation is good, as we've used a variant of it for other projects successfully. However, currently my performance on the bit-flip environment is poor and requires investigation.

@peasant98
Copy link

Ah cool, thanks for the update.

@prabhatnagarajan prabhatnagarajan linked a pull request Oct 28, 2020 that will close this issue
@abagaria
Copy link

HER requires that we make updates to the agent's policy+Q-function at the end of the episode. But, PFRL assumes that an agent.act(s) is followed by an agent.observe(s', r) (as evidenced by their use of batched_last_action to keep track of actions). How are you going to deal with that?

@prabhatnagarajan
Copy link
Contributor Author

Note that the HindsightReplayBuffer extends the EpisodicReplayBuffer. If you see the data structures within the EpisodicReplayBuffer, you can see that the episodic buffer maintains a current_episode which is only appended to the larger replay buffer when an episode is stopped. This ensures that when we perform updates, we're not using incomplete episodes.

About the use of batch_last_action, I'm not entirely sure what you're asking. If you see this function , we're using batch_last_action, yes, but it's being added to the replay buffer, not being used for updates. At the end of the function we call self.replay_updater.update_if_necessary(self.t) which will perform a gradient update, but it will not use batch_last_action.

Does this answer your question? If not, feel free to clarify and I'll do my best to answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants