Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug; RLlib]: Error in SampleBatch.get_single_step_input_dict() #20216

Closed
2 tasks done
sven1977 opened this issue Nov 10, 2021 · 0 comments · Fixed by #20217
Closed
2 tasks done

[Bug; RLlib]: Error in SampleBatch.get_single_step_input_dict() #20216

sven1977 opened this issue Nov 10, 2021 · 0 comments · Fixed by #20217
Assignees
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical rllib RLlib related issues

Comments

@sven1977
Copy link
Contributor

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

RLlib

What happened + What you expected to happen

There is a problem in SampleBatch.get_single_step_input_dict when having a complex traj. view setup for attention nets.
If a view-column is built from an underlying data column from a range of timesteps (e.g. state_in[-1] == state_out[-10:-1]), the method returns a wrong state_in_0.

The following repro test case should pass:

from gym.spaces import Box, Discrete
import numpy as np

from ray.rllib.policy.sample_batch import SampleBatch
from ray.rllib.policy.view_requirement import ViewRequirement
from ray.rllib.utils.test_utils import check

space = Box(-1.0, 1.0, ())

# With batch-repeat-value > 1: state_in_0 is only built every n
# timesteps.
view_reqs = {
    "state_in_0": ViewRequirement(
        data_col="state_out_0",
        shift="-5:-1",
        space=space,
        batch_repeat_value=5,
    ),
    "state_out_0": ViewRequirement(
        space=space, used_for_compute_actions=False),
}

# Trajectory of 1 ts (0) (we would like to compute the 1st).
batch = SampleBatch({
    "state_in_0": np.array([
        [0, 0, 0, 0, 0],  # ts=0
    ]),
    "state_out_0": np.array([1]),
})
input_dict = batch.get_single_step_input_dict(
    view_requirements=view_reqs, index="last")
check(
    input_dict,
    {
        "state_in_0": [[0, 0, 0, 0, 1]],  # ts=1
        "seq_lens": [1],
    })

Versions / Dependencies

ray=master
py=3.8
OSS=MacOS

Reproduction script

see above

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@sven1977 sven1977 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 10, 2021
@sven1977 sven1977 self-assigned this Nov 10, 2021
@sven1977 sven1977 added rllib RLlib related issues P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical rllib RLlib related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant