You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@pokaxpoka
The function used for relabeling the data in the buffer with an updated reward function is defined here: relabel_with_predictor. self.idx is used to compute total_iter here. After the replay buffer is full to capacity, self.idx will again start from 0 (in a cyclic manner). However, we would still want to relabel all the samples in the buffer with an updated reward function. The current code (line 72) does not allow this.
Maybe this should work:
import math
def relabel_with_predictor(self, predictor):
batch_size = 200
if self.full: # if the buffer is full
total_iter = math.ceil(self.capacity/batch_size) # line added
else:
total_iter = int(self.idx/batch_size)
if self.idx > batch_size*total_iter:
total_iter += 1
The text was updated successfully, but these errors were encountered:
Hi,
@pokaxpoka
The function used for relabeling the data in the buffer with an updated reward function is defined here: relabel_with_predictor.
self.idx
is used to computetotal_iter
here. After the replay buffer is full to capacity,self.idx
will again start from 0 (in a cyclic manner). However, we would still want to relabel all the samples in the buffer with an updated reward function. The current code (line 72) does not allow this.Maybe this should work:
The text was updated successfully, but these errors were encountered: