-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Cleanup examples folder #14: Add example script for policy (RLModule) inference on new API stack. #45831
[RLlib] Cleanup examples folder #14: Add example script for policy (RLModule) inference on new API stack. #45831
Conversation
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_14_policy_inference_examples
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
policy_id="default_policy", # <- default value | ||
) | ||
# Compute an action using a B=1 observation "batch". | ||
input_dict = {Columns.OBS: torch.from_numpy(obs).unsqueeze(0)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use here the input specs of the module to infer the keys? Here it’s simple but in other scenarios it could help users to figure out what to feed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure. The input specs don't tell us much about what exact data is required for the keys in the specs. I'm honestly thinking about removing the specs altogether at some point (long-term).
For example: If my RLModule - right now - says: I need obs
and prev_rewards
, then I still don't know for example, how many of the previous rewards are required. This detailed information - crucial for building the batch - is not something my model would tell me, I will have to provide a proper ConnectorV2 logic along with it.
# Send the computed action `a` to the env. | ||
obs, reward, done, truncated, _ = env.step(a) | ||
episode_reward += reward | ||
obs, reward, terminated, truncated, _ = env.step(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a beautiful example to show how simple this runs now. Let us think about some ways to simplify it also for modules using connectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Yeah, I agree. Doing these examples feel very fast and easy. I don't have to do much debugging at all to make these run right from the get-go. There is another PR that does something very similar, but with a connector (that handles the LSTM states).
…r policy (RLModule) inference on new API stack. (ray-project#45831) Signed-off-by: Richard Liu <[email protected]>
Add example script for policy (RLModule) inference on new API stack.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.