-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Allow for evaluation to run by timesteps
(alternative to episodes
) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish.
#20757
Changes from 6 commits
81abbed
e95eb05
723b5b1
dc0f032
5b1ad15
313294b
199cacc
5831922
6ddcf99
e31b546
e6b6b3b
c382f6c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -729,14 +729,28 @@ Customized Evaluation During Training | |
|
||
RLlib will report online training rewards, however in some cases you may want to compute | ||
rewards with different settings (e.g., with exploration turned off, or on a specific set | ||
of environment configurations). You can evaluate policies during training by setting | ||
the ``evaluation_interval`` config, and optionally also ``evaluation_num_episodes``, | ||
``evaluation_config``, ``evaluation_num_workers``, and ``custom_eval_function`` | ||
(see `trainer.py <https://github.com/ray-project/ray/blob/master/rllib/agents/trainer.py>`__ for further documentation). | ||
|
||
By default, exploration is left as-is within ``evaluation_config``. | ||
However, you can switch off any exploration behavior for the evaluation workers | ||
via: | ||
of environment configurations). You can activate evaluating policies during training by setting | ||
the ``evaluation_interval`` to an int value (> 0) indicating every how many training calls | ||
an "evaluation step" is run. | ||
One such "evaluation step" runs over ``evaluation_duration`` episodes or timesteps, depending | ||
on the ``evaluation_duration_unit`` setting, which can be either "episodes" (default) or "timesteps". | ||
|
||
Normally, the evaluation step is run after the respective train step. For example, for | ||
``evaluation_interval=2``, the sequence of steps is: ``train, train, eval, train, train, eval, ...``. | ||
For ``evaluation_interval=1``, the sequence is: ``train, eval, train, eval, ...``. | ||
Before each evaluation step, weights from the main model are synchronized to all evaluation workers. | ||
However, it is possible to run evaluation parallel to training via the ``evaluation_parallel_to_training=True`` | ||
config flag. In this case, both steps (train and eval) are run at the same time via threading. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I forgot, these eval workers can be remote as well right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, eval workers are a completely separate WorkerSet (separate from the "normal" WorkerSet used to collect training data). |
||
This can speed up the evaluation process significantly, but leads to a 1-iteration delay between reported | ||
training results and evaluation results (the evaluation results are "behind" as they use | ||
slightly outdated model weights). | ||
|
||
When in ``evaluation_parallel_to_training=True`` mode, a special setting: ``evaluation_duration=auto`` | ||
can be used that causes the evaluation step to take roughly as long as the train step. | ||
|
||
The config key ``evaluation_config`` allows you to override any config keys only for | ||
the evaluation workers. For example, to switch off exploration in the evaluation steps, | ||
do: | ||
|
||
.. code-block:: python | ||
|
||
|
@@ -752,6 +766,16 @@ via: | |
policy, even if this is a stochastic one. Setting "explore=False" above | ||
will result in the evaluation workers not using this stochastic policy. | ||
|
||
Parallelism for the evaluation step is determined via the ``evaluation_num_workers`` | ||
setting. Set this to higher values if you want the desired eval episodes or timesteps to | ||
run as much in parallel as possible. For example, if your ``evaluation_duration=10`` (``evaluation_duration_unit=episodes``) | ||
and ``evaluation_num_workers=10``, each eval worker only has to run 1 episode in each eval step. | ||
|
||
In case you would like to completely customize the evaluation step, set ``custom_eval_function`` in your | ||
config to a callable taking the Trainer object and a WorkerSet object (the evaluation WorkerSet) | ||
and returning a metrics dict. See `trainer.py <https://github.com/ray-project/ray/blob/master/rllib/agents/trainer.py>`__ | ||
for further documentation. | ||
|
||
There is an end to end example of how to set up custom online evaluation in `custom_eval.py <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_eval.py>`__. Note that if you only want to eval your policy at the end of training, you can set ``evaluation_interval: N``, where ``N`` is the number of training iterations before stopping. | ||
|
||
Below are some examples of how the custom evaluation metrics are reported nested under the ``evaluation`` key of normal training results: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor. does training call basically mean iterations here? just say 'indicating the number of iterations before an "evaluation steps" is run'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
clarified