-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Update bandit_envs_recommender_system #22421
[RLlib] Update bandit_envs_recommender_system #22421
Conversation
|
||
|
||
class ParametricItemRecoEnv(gym.Env): | ||
"""A recommendation environment which generates items with visible features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we remove this? I feel like we should explain, why it's called parametric. Or re-name it, as any of our recomm envs is somewhat parametric, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh good catch. yeah, I should have copied over the description.
done.
def render(self, mode="human"): | ||
raise NotImplementedError | ||
|
||
# Because ParametricRecSys follows RecSim's API, we have to wrap it before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this here?
Maybe just move this into the block below and register it under "ParametricRecSysEnvForBandits" or something like this.
Also, in the example script that actually uses this env, we can register it to show, how this is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of registering the env in example script. good demo.
thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the cleanup @gjoliver ! :) Left only some minor nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. thanks for the review.
|
||
|
||
class ParametricItemRecoEnv(gym.Env): | ||
"""A recommendation environment which generates items with visible features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh good catch. yeah, I should have copied over the description.
done.
def render(self, mode="human"): | ||
raise NotImplementedError | ||
|
||
# Because ParametricRecSys follows RecSim's API, we have to wrap it before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of registering the env in example script. good demo.
thanks.
Hey @gjoliver , could you re-merge? Lots of tests failing, I'm thinking b/c we are too far behind master. |
dac3892
to
d0a1bdf
Compare
( | ||
"item", | ||
gym.spaces.Box( | ||
low=-np.ones((num_items, embedding_dim)), | ||
high=np.ones((num_items, embedding_dim)), | ||
low=-1.0, high=1.0, shape=(num_items, embedding_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes @gjoliver . Great PR! :)
Why are these changes needed?
Update bandit_envs_recommender_system with Sven's in-house implementa…, which supports multiple users and slate sizes.
Make sure example tune_lin_ucb_train_recommendation.py nails this environment.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.