-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Offline RL] Support writing and reading composite spaces samples. #47046
[RLlib; Offline RL] Support writing and reading composite spaces samples. #47046
Conversation
…ace samples into JSONable data types, if composite, to write them more easily to and read them more easily from disk. Furthermore, added related functionalities to Offline recording and reading classes, basically 'OfflineData', 'OfflinePreLearner', and 'OfflineEnvRunner'. Signed-off-by: simonsays1980 <[email protected]>
…I learning tests. Signed-off-by: simonsays1980 <[email protected]>
], | ||
infos=[ | ||
{}, | ||
batch[schema[Columns.INFOS]][i] | ||
if schema[Columns.INFOS] in batch | ||
else {}, | ||
], | ||
actions=[batch[schema[Columns.ACTIONS]][i]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated question on the above:
- Why is the first info always
{}
? - I think
else {},
would yield a tuple, correct? So the resulting final list would be:[{}, ({},)]
. Maybe I'm wrong, but can you check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question, because there is only a single INFO column in all the offline data. As we need two in the Episode
we need to fill in a default and this is filled in at timestep zero.
YOur second point is valid - this could lead to a tuple, even though the comma should be part of the list ... I check this. Thanks1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, makes sense. We don't have a NEXT_INFOS
.
Hmm, I wonder whether this could be a general problem in some strange use cases :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about it, but could not came of with cases that need an info
at ts=0.
@@ -77,6 +77,7 @@ def sample( | |||
fn_constructor_kwargs={ | |||
"config": self.config, | |||
"learner": self.learner_handles[0], | |||
"spaces": self.spaces["__env__"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we merge the other (__env__
constant) PR into this one and then push again. This way, we won't forget to change this here.
observations=[ | ||
unpack_if_needed(obs), | ||
unpack_if_needed(batch[schema[Columns.NEXT_OBS]][i]), | ||
convert(unpack_if_needed(obs), observation_space) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if observation_space
is None here? Could that happen? Or should we make the arg non-optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could happen and is in fact okay. As it uses from_jsonable_if_needed
the conversion simply does not take place, if the input space is None
. We need them only, if we want to convert from JSONable data types to a composite space type.
actions=[ | ||
convert( | ||
unpack_if_needed(batch[schema[Columns.ACTIONS]][i]), | ||
action_space, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for action_space
. What if it's None (not provided by caller)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general! Just some nits and comments to fix before we can merge. Thanks @simonsays1980 !!
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: Sven Mika <[email protected]>
Why are these changes needed?
Writing and reading composite space samples does not work out-of-the-box (e.g. storing them in
parquet
files) because errors like the following might occur:This PR proposes some functionality to enable full compatibility with composite spaces. It does so by
gymnasium
sto/from_jsonable
)Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.