-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Bandit documentation enhancements. #22427
[RLlib] Bandit documentation enhancements. #22427
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the update. just a minor suggestion.
Contextual Bandits | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
The Multi-armed bandit (MAB) problem provides a simplified RL setting that | ||
involves learning to act under one situation only, i.e. the state is fixed. | ||
Contextual bandit is extension of the MAB problem, where at each | ||
involves learning to act under one situation only, i.e. the observation/state is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from this paper (http://rob.schapire.net/papers/www10.pdf) MAB is a special case of contextual bandit where the context (user) and the arms are both fixed.
maybe we can say "i.e., the context and arms are both fixed."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks for the hint and the review @gjoliver ! Will add this before merging.
Bandit documentation enhancements.
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.