Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Implement BOHB #5382

Merged
merged 34 commits into from
Aug 13, 2019
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
9571fab
tabular benchmarks
lisadunlap Aug 2, 2019
b3333a1
added bohb and examples
lisadunlap Aug 6, 2019
0b418b5
added bohb and example
lisadunlap Aug 6, 2019
5ee0ee1
fixed PR changes
lisadunlap Aug 9, 2019
d6d715b
added reformatted bohb and docs
lisadunlap Aug 9, 2019
d69f880
Delete __init__.py
richardliaw Aug 9, 2019
00edd95
removed BOHB class
lisadunlap Aug 9, 2019
9679b82
Merge branch 'bohb' of github.com:lisadunlap/ray into bohb
lisadunlap Aug 9, 2019
1eb9674
fixed docs
lisadunlap Aug 9, 2019
fe2e962
fixed PR comments and added class descriptions
lisadunlap Aug 9, 2019
c109e13
fixed import error
lisadunlap Aug 9, 2019
bc3a951
formatting
lisadunlap Aug 9, 2019
f405d93
reformatting
lisadunlap Aug 9, 2019
7e5cabf
Update doc/source/tune-searchalg.rst
lisadunlap Aug 9, 2019
c19f5ce
added imports to dockerfile
lisadunlap Aug 9, 2019
fa2159d
yet another import error
lisadunlap Aug 10, 2019
06c6dda
fix
lisadunlap Aug 11, 2019
922af5f
format
lisadunlap Aug 11, 2019
be0b110
increase redundancy between HB and BOHB
richardliaw Aug 11, 2019
cc05726
increase redundancy between HB and BOHB
richardliaw Aug 11, 2019
9beb26f
Merge branch 'master' into bohb
richardliaw Aug 11, 2019
b0d81ac
Merge branch 'bohb' of github.com:lisadunlap/ray into bohb
richardliaw Aug 11, 2019
e34973b
formatting and better example
richardliaw Aug 12, 2019
09daf85
docs
richardliaw Aug 12, 2019
b080549
better tests
richardliaw Aug 12, 2019
27f941a
suggest
richardliaw Aug 12, 2019
3d42a8b
dogstrings
richardliaw Aug 12, 2019
ce0c1c7
ok reverse
richardliaw Aug 12, 2019
bb47d7d
test
richardliaw Aug 12, 2019
1c6e598
fixexample
richardliaw Aug 12, 2019
77b1d61
dockerfile?
richardliaw Aug 12, 2019
44e8dc6
fix
richardliaw Aug 12, 2019
5fec52a
fix
richardliaw Aug 12, 2019
bbdaa01
bohb
richardliaw Aug 13, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions ci/jenkins_tests/run_tune_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 +109,8 @@ $SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE}
$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/skopt_example.py \
--smoke-test

# uncomment once statsmodels is updated.
# $SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
# python /ray/python/ray/tune/examples/bohb_example.py \
# --smoke-test
17 changes: 16 additions & 1 deletion doc/source/tune-schedulers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,13 +131,28 @@ On the other hand, holding ``R`` constant at ``R = 300`` and varying ``eta`` als

The implementation takes the same configuration as the example given in the paper and exposes ``max_t``, which is not a parameter in the paper.

2. The example in the `post <https://people.eecs.berkeley.edu/~kjamieson/hyperband.html>`_ to calculate ``n_0`` is actually a little different than the algorithm given in the paper. In this implementation, we implement ``n_0`` according to the paper (which is `n` in the below example):
2. The example in the `post <https://homes.cs.washington.edu/~jamieson/hyperband.html>`_ to calculate ``n_0`` is actually a little different than the algorithm given in the paper. In this implementation, we implement ``n_0`` according to the paper (which is `n` in the below example):

.. image:: images/hyperband_allocation.png


3. There are also implementation specific details like how trials are placed into brackets which are not covered in the paper. This implementation places trials within brackets according to smaller bracket first - meaning that with low number of trials, there will be less early stopping.

HyperBand (BOHB)
----------------

.. tip:: This implementation is still experimental. Please report issues on https://github.com/ray-project/ray/issues/. Thanks!

This class is a variant of HyperBand that enables the BOHB Algorithm. This implementation is true to the original HyperBand implementation and does not implement pipelining nor straggler mitigation.

This is to be used in conjunction with the Tune BOHB search algorithm. See `TuneBOHB <tune-searchalg.html#BOHB>`_ for package requirements, examples, and details.

An example of this in use can be found in `bohb_example.py <https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/bohb_example.py>`_.

.. autoclass:: ray.tune.schedulers.HyperBandForBOHB
:noindex:


Median Stopping Rule
--------------------

Expand Down
48 changes: 48 additions & 0 deletions doc/source/tune-searchalg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Currently, Tune offers the following search algorithms (and library integrations
- `Nevergrad <tune-searchalg.html#nevergrad-search>`__
- `Scikit-Optimize <tune-searchalg.html#scikit-optimize-search>`__
- `Ax <tune-searchalg.html#ax-search>`__
- `BOHB <tune-searchalg.html#bohb>`__


Variant Generation (Grid Search/Random Search)
Expand Down Expand Up @@ -181,6 +182,53 @@ An example of this can be found in `ax_example.py <https://github.com/ray-projec
:show-inheritance:
:noindex:

BOHB
----

.. tip:: This implementation is still experimental. Please report issues on https://github.com/ray-project/ray/issues/. Thanks!

``BOHB`` (Bayesian Optimization HyperBand) is a SearchAlgorithm that is backed by `HpBandSter <https://github.com/automl/HpBandSter>`__ to perform sequential model-based hyperparameter optimization in conjunction with HyperBand. Note that this class does not extend ``ray.tune.suggest.BasicVariantGenerator``, so you will not be able to use Tune's default variant generation/search space declaration when using BOHB.

Importantly, BOHB is intended to be paired with a specific scheduler class: `HyperBandForBOHB <tune-schedulers.html#hyperband-bohb>`__.

This algorithm requires using the `ConfigSpace search space specification <https://automl.github.io/HpBandSter/build/html/quickstart.html#searchspace>`_. In order to use this search algorithm, you will need to install ``HpBandSter`` and ``ConfigSpace``:

.. code-block:: bash

$ pip install hpbandster ConfigSpace


You can use ``TuneBOHB`` in conjunction with ``HyperBandForBOHB`` as follows:

.. code-block:: python

# BOHB uses ConfigSpace for their hyperparameter search space
import ConfigSpace as CS

config_space = CS.ConfigurationSpace()
config_space.add_hyperparameter(
CS.UniformFloatHyperparameter("height", lower=10, upper=100))
config_space.add_hyperparameter(
CS.UniformFloatHyperparameter("width", lower=0, upper=100))

experiment_metrics = dict(metric="episode_reward_mean", mode="min")
bohb_hyperband = HyperBandForBOHB(
time_attr="training_iteration", max_t=100, **experiment_metrics)
bohb_search = TuneBOHB(
config_space, max_concurrent=4, **experiment_metrics)

tune.run(MyTrainableClass,
name="bohb_test",
scheduler=bohb_hyperband,
search_alg=bohb_search,
num_samples=5)

Take a look at `an example here <https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/bohb_example.py>`_. See the `BOHB paper <https://arxiv.org/abs/1807.01774>`_ for more details.

.. autoclass:: ray.tune.suggest.bohb.TuneBOHB
:show-inheritance:
:noindex:

Contributing a New Algorithm
----------------------------

Expand Down
4 changes: 1 addition & 3 deletions docker/examples/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ RUN pip install gym[atari] opencv-python-headless tensorflow lz4 keras pytest-ti
RUN pip install -U h5py # Mutes FutureWarnings
RUN pip install --upgrade bayesian-optimization
RUN pip install --upgrade git+git://github.com/hyperopt/hyperopt.git
RUN pip install --upgrade sigopt
RUN pip install --upgrade nevergrad
RUN pip install --upgrade scikit-optimize
RUN pip install --upgrade sigopt nevergrad scikit-optimize hpbandster ConfigSpace
RUN pip install -U pytest-remotedata>=0.3.1
RUN conda install pytorch-cpu torchvision-cpu -c pytorch
4 changes: 1 addition & 3 deletions docker/tune_test/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,7 @@ RUN conda remove -y --force wrapt
RUN pip install gym[atari]==0.10.11 opencv-python-headless tensorflow lz4 keras pytest-timeout smart_open
RUN pip install --upgrade bayesian-optimization
RUN pip install --upgrade git+git://github.com/hyperopt/hyperopt.git
RUN pip install --upgrade sigopt
RUN pip install --upgrade nevergrad
RUN pip install --upgrade scikit-optimize
RUN pip install --upgrade sigopt nevergrad scikit-optimize hpbandster ConfigSpace
RUN pip install -U pytest-remotedata>=0.3.1
RUN conda install pytorch-cpu torchvision-cpu -c pytorch

Expand Down
82 changes: 82 additions & 0 deletions python/ray/tune/examples/bohb_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#!/usr/bin/env python

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import json
import os

import numpy as np

import ray
from ray.tune import Trainable, run
from ray.tune.schedulers.hb_bohb import HyperBandForBOHB
from ray.tune.suggest.bohb import TuneBOHB

parser = argparse.ArgumentParser()
parser.add_argument(
"--smoke-test", action="store_true", help="Finish quickly for testing")
parser.add_argument(
"--ray-redis-address",
help="Address of Ray cluster for seamless distributed execution.")
args, _ = parser.parse_known_args()


class MyTrainableClass(Trainable):
"""Example agent whose learning curve is a random sigmoid.

The dummy hyperparameters "width" and "height" determine the slope and
maximum reward value reached.
"""

def _setup(self, config):
self.timestep = 0

def _train(self):
self.timestep += 1
v = np.tanh(float(self.timestep) / self.config.get("width", 1))
v *= self.config.get("height", 1)

# Here we use `episode_reward_mean`, but you can also report other
# objectives such as loss or accuracy.
return {"episode_reward_mean": v}

def _save(self, checkpoint_dir):
path = os.path.join(checkpoint_dir, "checkpoint")
with open(path, "w") as f:
f.write(json.dumps({"timestep": self.timestep}))
return path

def _restore(self, checkpoint_path):
with open(checkpoint_path) as f:
self.timestep = json.loads(f.read())["timestep"]


if __name__ == "__main__":
import ConfigSpace as CS
ray.init(redis_address=args.ray_redis_address)

# BOHB uses ConfigSpace for their hyperparameter search space
config_space = CS.ConfigurationSpace()
config_space.add_hyperparameter(
CS.UniformFloatHyperparameter("height", lower=10, upper=100))
config_space.add_hyperparameter(
CS.UniformFloatHyperparameter("width", lower=0, upper=100))

experiment_metrics = dict(metric="episode_reward_mean", mode="min")
bohb_hyperband = HyperBandForBOHB(
time_attr="training_iteration",
max_t=100,
reduction_factor=4,
**experiment_metrics)
bohb_search = TuneBOHB(
config_space, max_concurrent=4, **experiment_metrics)

run(MyTrainableClass,
name="bohb_test",
scheduler=bohb_hyperband,
search_alg=bohb_search,
num_samples=10,
stop={"training_iteration": 10 if args.smoke_test else 100})
3 changes: 2 additions & 1 deletion python/ray/tune/schedulers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from ray.tune.schedulers.trial_scheduler import TrialScheduler, FIFOScheduler
from ray.tune.schedulers.hyperband import HyperBandScheduler
from ray.tune.schedulers.hb_bohb import HyperBandForBOHB
from ray.tune.schedulers.async_hyperband import (AsyncHyperBandScheduler,
ASHAScheduler)
from ray.tune.schedulers.median_stopping_rule import MedianStoppingRule
Expand All @@ -12,5 +13,5 @@
__all__ = [
"TrialScheduler", "HyperBandScheduler", "AsyncHyperBandScheduler",
"ASHAScheduler", "MedianStoppingRule", "FIFOScheduler",
"PopulationBasedTraining"
"PopulationBasedTraining", "HyperBandForBOHB"
]
128 changes: 128 additions & 0 deletions python/ray/tune/schedulers/hb_bohb.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import logging

from ray.tune.schedulers.trial_scheduler import TrialScheduler
from ray.tune.schedulers.hyperband import HyperBandScheduler, Bracket
from ray.tune.trial import Trial

logger = logging.getLogger(__name__)


class HyperBandForBOHB(HyperBandScheduler):
"""Extends HyperBand early stopping algorithm for BOHB.

This implementation removes the ``HyperBandScheduler`` pipelining. This
class introduces key changes:

1. Trials are now placed so that the bracket with the largest size is
filled first.

2. Trials will be paused even if the bracket is not filled. This allows
BOHB to insert new trials into the training.

See ray.tune.schedulers.HyperBandScheduler for parameter docstring.
"""

def on_trial_add(self, trial_runner, trial):
"""Adds new trial.

On a new trial add, if current bracket is not filled, add to current
bracket. Else, if current band is not filled, create new bracket, add
to current bracket. Else, create new iteration, create new bracket,
add to bracket.
"""

cur_bracket = self._state["bracket"]
cur_band = self._hyperbands[self._state["band_idx"]]
if cur_bracket is None or cur_bracket.filled():
retry = True
while retry:
# if current iteration is filled, create new iteration
if self._cur_band_filled():
cur_band = []
self._hyperbands.append(cur_band)
self._state["band_idx"] += 1

# MAIN CHANGE HERE - largest bracket first!
# cur_band will always be less than s_max_1 or else filled
s = self._s_max_1 - len(cur_band) - 1
assert s >= 0, "Current band is filled!"
if self._get_r0(s) == 0:
logger.debug("BOHB: Bracket too small - Retrying...")
cur_bracket = None
else:
retry = False
cur_bracket = Bracket(self._time_attr, self._get_n0(s),
self._get_r0(s), self._max_t_attr,
self._eta, s)
cur_band.append(cur_bracket)
self._state["bracket"] = cur_bracket

self._state["bracket"].add_trial(trial)
self._trial_info[trial] = cur_bracket, self._state["band_idx"]

def on_trial_result(self, trial_runner, trial, result):
"""If bracket is finished, all trials will be stopped.

If a given trial finishes and bracket iteration is not done,
the trial will be paused and resources will be given up.

This scheduler will not start trials but will stop trials.
The current running trial will not be handled,
as the trialrunner will be given control to handle it."""

result["hyperband_info"] = {}
bracket, _ = self._trial_info[trial]
bracket.update_trial_stats(trial, result)

if bracket.continue_trial(trial):
return TrialScheduler.CONTINUE

result["hyperband_info"]["budget"] = bracket._cumul_r

# MAIN CHANGE HERE!
statuses = [(t, t.status) for t in bracket._live_trials]
if not bracket.filled() or any(status != Trial.PAUSED
for t, status in statuses
if t is not trial):
trial_runner._search_alg.on_pause(trial.trial_id)
return TrialScheduler.PAUSE
action = self._process_bracket(trial_runner, bracket)
return action

def _unpause_trial(self, trial_runner, trial):
trial_runner.trial_executor.unpause_trial(trial)
trial_runner._search_alg.on_unpause(trial.trial_id)

def choose_trial_to_run(self, trial_runner):
"""Fair scheduling within iteration by completion percentage.

List of trials not used since all trials are tracked as state
of scheduler. If iteration is occupied (ie, no trials to run),
then look into next iteration.
"""

for hyperband in self._hyperbands:
# band will have None entries if no resources
# are to be allocated to that bracket.
scrubbed = [b for b in hyperband if b is not None]
for bracket in scrubbed:
for trial in bracket.current_trials():
if (trial.status == Trial.PENDING
and trial_runner.has_resources(trial.resources)):
return trial
# MAIN CHANGE HERE!
if not any(t.status == Trial.RUNNING
for t in trial_runner.get_trials()):
for hyperband in self._hyperbands:
for bracket in hyperband:
if bracket and any(trial.status == Trial.PAUSED
for trial in bracket.current_trials()):
# This will change the trial state and let the
# trial runner retry.
self._process_bracket(trial_runner, bracket)
# MAIN CHANGE HERE!
return None
Loading