Code to Accompany the Book "Bandit Algorithms for Website Optimization"

This repo contains code in several languages that implements several standard algorithms for solving the Multi-Armed Bandits Problem, including:

epsilon-Greedy
Softmax (Boltzmann)
UCB1
UCB2
Hedge
Exp3

It also contains code that provides a testing framework for bandit algorithms based around simple Monte Carlo simulations.

Languages

This codebase is split up by language. In most languages, there are parallel implementations of the core algorithms and infrastructure for testing the algorithms:

Python
Julia
Ruby

In R, there is a body of code for visualizing the results of simulations and analyzing those results. The R code would benefit from some refactoring to make it DRYer.

If you're interested in seeing how some of these algorithms would be implemented in Javascript, you should try out Mark Reid's code: http://mark.reid.name/code/bandits/

Getting Started

To try out this code, you can go into the Python or Julia directories and then run the demo script.

In Python, that looks like:

python demo.py

In Julia, that looks like:

julia demo.jl

You should step through that code line-by-line to understand what the functions are doing. The book provides more in-depth explanations of how the algorithms work.

The Ruby code was contributed by Kashif Rasul. If you're interested in translating the code into another language, please submit a pull request. I will merge any new implementations as soon as I can.

Adding New Algorithms: API Expectations

As described in the book, a Bandit algorithm should implement two methods:

select_arm(): A method that returns the index of the Arm that the Bandit object selects on the current play. No arguments are required.
update(): A method that updates the internal state of the Bandit object in response to its most recently selected arm's reward. The index of the chosen arm and the amount of reward received must be passed as arguments.

As described in the book, an Arm simulator should implement:

draw(): A method that returns a single instance of reward from the arm that was pulled. No arguments are required.

In addition, the Bandit algorithms are designed to implement one additional method used in simulations:

initialize(): A method that returns nothing. Instead, this method resets all of the data-driven variables in a Bandit object. For most objects, this resets the counts and values field to their initial states. No arguments are required.

Beyond the testing framework described in the book, I am currently providing an additional system built around the concept of an Environment. Environment objects encapsulate not only a set of Arms, but also a mechanism for having those Arms change over time. This allows you to simulate complex scenarios that aren't well described by a constant set of arms.

If you would like to implement your own Environment, you will need to provide a very simple interface. The Environment interface requries you to implement two methods:

arms(): A method that returns the array of arms that exist at time T. You must pass T as an argument.
n_arms(): A method that returns the number of arms that the environment will return with each call to arms(). While the arms may change over time, the number of arms should not. No arguments are required.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
julia		julia
python		python
r		r
ruby		ruby
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
generate_plots.sh		generate_plots.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code to Accompany the Book "Bandit Algorithms for Website Optimization"

Languages

Getting Started

Adding New Algorithms: API Expectations

About

Releases

Packages

clstopher/BanditsBook

Folders and files

Latest commit

History

Repository files navigation

Code to Accompany the Book "Bandit Algorithms for Website Optimization"

Languages

Getting Started

Adding New Algorithms: API Expectations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages