You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0] linux
0.1.1
Problem description
In onpolicy_adapter.py, the end of the episodes are handled like this:
obs=next_obsepoch_end=step>=steps_per_epoch-1foridx, (done, time_out) inenumerate(zip(terminated, truncated)):
ifepoch_endordoneortime_out:
if (epoch_endortime_out) andnotdone:
ifepoch_end:
logger.log(
f'Warning: trajectory cut off when rollout by epoch at {self._ep_len[idx]} steps.'
)
_, last_value_r, last_value_c, _=agent.step(obs[idx])
last_value_r=last_value_r.unsqueeze(0)
last_value_c=last_value_c.unsqueeze(0)
elifdone:
last_value_r=torch.zeros(1)
last_value_c=torch.zeros(1)
ifdoneortime_out:
self._log_metrics(logger, idx)
self._reset_log(idx)
self._ep_ret[idx] =0.0self._ep_cost[idx] =0.0self._ep_len[idx] =0.0buffer.finish_path(last_value_r, last_value_c, idx)
while in safety-gymnasium, when the episode end, it will auto reset immediately and carry the last state by info. For example, I think when 'time_out==True, epoch_end==False, done==False', the value of the last state is calculated from the first observation of next episode.
Reproducible example code
The Python snippets:
Command lines:
Extra dependencies:
Steps to reproduce:
Traceback
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Required prerequisites
What version of OmniSafe are you using?
0.1.1
System information
3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0] linux
0.1.1
Problem description
In onpolicy_adapter.py, the end of the episodes are handled like this:
while in safety-gymnasium, when the episode end, it will auto reset immediately and carry the last state by
info
. For example, I think when 'time_out==True, epoch_end==False, done==False', the value of the last state is calculated from the first observation of next episode.Reproducible example code
The Python snippets:
Command lines:
Extra dependencies:
Steps to reproduce:
Traceback
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: