Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] A bug in on-policy adapter with autoreset mechanism #161

Closed
3 tasks done
Dtrc2207 opened this issue Mar 19, 2023 · 1 comment
Closed
3 tasks done

[BUG] A bug in on-policy adapter with autoreset mechanism #161

Dtrc2207 opened this issue Mar 19, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@Dtrc2207
Copy link
Contributor

Required prerequisites

What version of OmniSafe are you using?

0.1.1

System information

3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0] linux
0.1.1

Problem description

In onpolicy_adapter.py, the end of the episodes are handled like this:

            obs = next_obs
            epoch_end = step >= steps_per_epoch - 1
            for idx, (done, time_out) in enumerate(zip(terminated, truncated)):
                if epoch_end or done or time_out:
                    if (epoch_end or time_out) and not done:
                        if epoch_end:
                            logger.log(
                                f'Warning: trajectory cut off when rollout by epoch at {self._ep_len[idx]} steps.'
                            )
                        _, last_value_r, last_value_c, _ = agent.step(obs[idx])
                        last_value_r = last_value_r.unsqueeze(0)
                        last_value_c = last_value_c.unsqueeze(0)
                    elif done:
                        last_value_r = torch.zeros(1)
                        last_value_c = torch.zeros(1)

                    if done or time_out:
                        self._log_metrics(logger, idx)
                        self._reset_log(idx)

                        self._ep_ret[idx] = 0.0
                        self._ep_cost[idx] = 0.0
                        self._ep_len[idx] = 0.0

                    buffer.finish_path(last_value_r, last_value_c, idx)

while in safety-gymnasium, when the episode end, it will auto reset immediately and carry the last state by info. For example, I think when 'time_out==True, epoch_end==False, done==False', the value of the last state is calculated from the first observation of next episode.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

@Dtrc2207 Dtrc2207 added the bug Something isn't working label Mar 19, 2023
@friedmainfunction
Copy link
Collaborator

Thanks for opening this issue, fixed in #162 #164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants