You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
Ray Component
RLlib
What happened + What you expected to happen
There is a problem in SampleBatch.get_single_step_input_dict when having a complex traj. view setup for attention nets.
If a view-column is built from an underlying data column from a range of timesteps (e.g. state_in[-1] == state_out[-10:-1]), the method returns a wrong state_in_0.
The following repro test case should pass:
from gym.spaces import Box, Discrete
import numpy as np
from ray.rllib.policy.sample_batch import SampleBatch
from ray.rllib.policy.view_requirement import ViewRequirement
from ray.rllib.utils.test_utils import check
space = Box(-1.0, 1.0, ())
# With batch-repeat-value > 1: state_in_0 is only built every n
# timesteps.
view_reqs = {
"state_in_0": ViewRequirement(
data_col="state_out_0",
shift="-5:-1",
space=space,
batch_repeat_value=5,
),
"state_out_0": ViewRequirement(
space=space, used_for_compute_actions=False),
}
# Trajectory of 1 ts (0) (we would like to compute the 1st).
batch = SampleBatch({
"state_in_0": np.array([
[0, 0, 0, 0, 0], # ts=0
]),
"state_out_0": np.array([1]),
})
input_dict = batch.get_single_step_input_dict(
view_requirements=view_reqs, index="last")
check(
input_dict,
{
"state_in_0": [[0, 0, 0, 0, 1]], # ts=1
"seq_lens": [1],
})
Versions / Dependencies
ray=master
py=3.8
OSS=MacOS
Reproduction script
see above
Anything else
No response
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
sven1977
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Nov 10, 2021
sven1977
added
rllib
RLlib related issues
P2
Important issue, but not time-critical
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Nov 10, 2021
Search before asking
Ray Component
RLlib
What happened + What you expected to happen
There is a problem in SampleBatch.get_single_step_input_dict when having a complex traj. view setup for attention nets.
If a view-column is built from an underlying data column from a range of timesteps (e.g. state_in[-1] == state_out[-10:-1]), the method returns a wrong
state_in_0
.The following repro test case should pass:
Versions / Dependencies
ray=master
py=3.8
OSS=MacOS
Reproduction script
see above
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: