[BUG] A bug in on-policy adapter with autoreset mechanism #161

Dtrc2207 · 2023-03-19T07:53:27Z

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.1.1

System information

3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0] linux
0.1.1

Problem description

In onpolicy_adapter.py， the end of the episodes are handled like this：

            obs = next_obs
            epoch_end = step >= steps_per_epoch - 1
            for idx, (done, time_out) in enumerate(zip(terminated, truncated)):
                if epoch_end or done or time_out:
                    if (epoch_end or time_out) and not done:
                        if epoch_end:
                            logger.log(
                                f'Warning: trajectory cut off when rollout by epoch at {self._ep_len[idx]} steps.'
                            )
                        _, last_value_r, last_value_c, _ = agent.step(obs[idx])
                        last_value_r = last_value_r.unsqueeze(0)
                        last_value_c = last_value_c.unsqueeze(0)
                    elif done:
                        last_value_r = torch.zeros(1)
                        last_value_c = torch.zeros(1)

                    if done or time_out:
                        self._log_metrics(logger, idx)
                        self._reset_log(idx)

                        self._ep_ret[idx] = 0.0
                        self._ep_cost[idx] = 0.0
                        self._ep_len[idx] = 0.0

                    buffer.finish_path(last_value_r, last_value_c, idx)

while in safety-gymnasium, when the episode end, it will auto reset immediately and carry the last state by info. For example, I think when 'time_out==True, epoch_end==False, done==False', the value of the last state is calculated from the first observation of next episode.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

friedmainfunction · 2023-03-20T08:29:49Z

Thanks for opening this issue, fixed in #162 #164

Dtrc2207 added the bug Something isn't working label Mar 19, 2023

Dtrc2207 assigned zmsn-2077 Mar 19, 2023

friedmainfunction closed this as completed Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] A bug in on-policy adapter with autoreset mechanism #161

[BUG] A bug in on-policy adapter with autoreset mechanism #161

Dtrc2207 commented Mar 19, 2023

friedmainfunction commented Mar 20, 2023

[BUG] A bug in on-policy adapter with autoreset mechanism #161

[BUG] A bug in on-policy adapter with autoreset mechanism #161

Comments

Dtrc2207 commented Mar 19, 2023

Required prerequisites

What version of OmniSafe are you using?

System information

Problem description

Reproducible example code

Traceback

Expected behavior

Additional context

friedmainfunction commented Mar 20, 2023