[RLlib] Allow for evaluation to run by `timesteps` (alternative to `episodes`) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. #20757

sven1977 · 2021-11-29T10:55:13Z

This PR introduces 2 new evaluation control config keys, which are:

evaluation_duration_unit: "episodes|timesteps": Evaluation can now also be configured by timesteps (no more hard requirement of counting in episodes).
evaluation_duration: replaces evaluation_num_episodes (which is now soft-deprecated). Either an int indicating the timesteps|episodes to run each evaluation run OR "auto" for automatically running evaluation for as long as the parallel training loop runs. "auto" is only supported if evaluation_parallel_to_training=True. Note that using evaluation_duration_unit=timesteps with evaluation_duration=auto results in a more accurate control over the evaluation duration by RLlib (as episodes can unexpectedly take long times, which would make the train loop have to wait for the evaluation loop to finish).

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…uple_evaluation_and_training

gjoliver

a few questions. thanks.

gjoliver · 2021-12-01T16:57:48Z

doc/source/rllib-training.rst

-However, you can switch off any exploration behavior for the evaluation workers
-via:
+of environment configurations). You can activate evaluating policies during training by setting
+the ``evaluation_interval`` to an int value (> 0) indicating every how many training calls


minor. does training call basically mean iterations here? just say 'indicating the number of iterations before an "evaluation steps" is run'?

+1
clarified

gjoliver · 2021-12-01T17:05:39Z

doc/source/rllib-training.rst

+For ``evaluation_interval=1``, the sequence is: ``train, eval, train, eval, ...``.
+Before each evaluation step, weights from the main model are synchronized to all evaluation workers.
+However, it is possible to run evaluation parallel to training via the ``evaluation_parallel_to_training=True``
+config flag. In this case, both steps (train and eval) are run at the same time via threading.


I forgot, these eval workers can be remote as well right?

Yes, eval workers are a completely separate WorkerSet (separate from the "normal" WorkerSet used to collect training data).

gjoliver · 2021-12-01T17:18:23Z

rllib/agents/trainer.py

+            # Set `batch_mode=truncate_episodes` and set
+            # `rollout_fragment_length` such that desired steps are divided
+            # equally amongst workers or - in auto duration mode - set it
+            # to a reasonable small number (10).


actually any chance you can expand this comment a little bit, so we understand why a reasonable small number works for auto mode ... thanks :)

gjoliver · 2021-12-01T17:30:23Z

rllib/agents/trainer.py

@@ -914,19 +952,32 @@ def step_attempt(self) -> ResultDict:
                with concurrent.futures.ThreadPoolExecutor() as executor:
                    train_future = executor.submit(


I notice this is still not fully parallelized, like our eval() call will overlap with at most 1 train_exec_impl() right?
I imagine the logic may be cleaner if we always have train_exec_impl() run in the main thread, and simply throw self.evaluate() into a ThreadPoolExecutor() if we want ""evaluation_parallel_to_training"".

Something like:

while True: evaluate_this_iter = ... if evaluate_this_iter: if evaluation_parallel_to_training: self.thread_executor.submit(self.evaluate) else: self.evaluate() self.train_exec_impl()

Something like this. Just an idea.
This is definitely out of scope for this PR, but would love to see what you think or what I missed.

I think this is how it used to work, but the problem is that the eval step needs to check, whether the train step (which is running in a thread) is done, not the other way around. So the logic is:

start train step in a thread

do eval (10 timesteps as per rollout_frag_len setting)

check if train is done
-- no? -> do another 10 steps
-- yes? -> return eval results immediately, such that the next call to train doesn't have to be blocked by a still running eval step

Note that we probably should not use the phases "complete decoupling" as this is misleading: We still need some form of coupling as we have to synch weights before each train+eval step from local worker to all evaluation workers. It's merely the duration of the eval step that we more closely align with the train duration in this PR such that it feels like no one has to wait for the other one anymore.

I'll add some example settings to the docs as well and change the title of this PR to clarify this. I hope this explains the logic.

again totally out of scope for this PR, just discussion at this point. I am curious:

the assumption here seems to be that eval step is much faster than train step. so we let eval workers check whether train is done. kind of feel like that's not necessarily true actually. also why do another 10 steps ... won't that make different eval steps run for different amount of steps, and introduce variance to the eval result?

I think evaluation_parallel_to_training=False is a simple case. eval simply blocks train, no controversial there.
but when evaluation_parallel_to_training=True, we gotta make sure eval runs in a separate thread so the main training thread never gets blocked?
something like:

while True: train() if eval_is_currently_running: # do nothing. continue else: if is_there_new_eval_result: # report freshly available eval results. eval_result = ... else: eval_result = None if should_we_kick_off_a_new_eval: sync_weights kick_off_a_new_eval_in_thread_pool.

again, not nit picking, but saying, and it feels like train in main and eval on the side is easier for users' mental model.

It's also more likely to be CORRECT... 😏

gjoliver · 2021-12-01T17:32:23Z

rllib/agents/trainer.py

+
+                        # Run at least one `evaluate()`, even if the training
+                        # is very fast.
+                        def duration_fn(remaining_duration):


should we define this duration_fn outside of all these if clauses, higher up in this function, so it doesn't have to live in such a nested place?
you can partial bind the train_future instance for example.

gjoliver · 2021-12-01T17:35:13Z

rllib/agents/trainer.py

-                            else:
+                            # Count by episodes. -> Run n more
+                            # (n=num eval workers).
+                            elif unit == "episodes":
                                return self.config["evaluation_num_workers"]


actually I don't quite get the logic here anymore. duration_fn gets called over and over to see whether the evaluation should stop, right? why do we always return self.config["evaluation_num_workers"] here, which is a positive number?
as written, eval only stops as soon as remaining_duration > 0 and train_future is done?

Correct, it gets called over and over to determine whether there is some eval duration left (>0) to run.

For evaluation_duration_unit=episodes (and auto duration!):

Run n more episodes: Each worker will - on a call to its sample() run exactly one episode (as this is configured via the worker's batch_mode=complete_episodes and rollout_fragment_length=1 settings).

Then check again, whether training is done and we can stop

For evaluation_duration_unit=timesteps (and auto duration!):

Run [num-workers * fragment-len * num_envs_per_worker] more timesteps: note that this is the minimum that one sample() on all workers will do.

In the non-auto case:
The result of the duration_fn is used to determine, how many more sample() (on which eval workers) we need to call. This is important to reach the exact desired number of timesteps/episodes with n workers.

gjoliver · 2021-12-01T17:41:44Z

rllib/agents/trainer.py

                round_ = 0
                while True:
-                    episodes_left_to_do = episodes_left_fn(num_episodes_done)
-                    if episodes_left_to_do <= 0:
+                    units_left_to_do = duration_fn(num_units_done)


duration_fn takes a parameter remaining_duration, but we pass num_units_done here.
I feel like one of them is wrong ...

Great catch! Just affected the naming of the function arg. Fixed! :)

sven1977 · 2021-12-03T10:32:09Z

Hey @gjoliver , I addressed all your questions and concerns. Could you take another look?
Thanks! :)

…uple_evaluation_and_training

gjoliver

otherwise, this looks cool. thanks.

gjoliver · 2021-12-03T22:28:46Z

there are test failures though.

…uple_evaluation_and_training

rfali · 2022-01-04T05:12:20Z

Great feature @sven1977!!. Should this be expected in ray 1.9.2 release?

gjoliver · 2022-01-04T05:26:24Z

1.10.0 should come out soon :)

…

On Mon, Jan 3, 2022 at 9:12 PM Farrukh Ali ***@***.***> wrote: Great feature @sven1977 <https://github.com/sven1977>!!. Should this be expected in ray 1.9.2 release? — Reply to this email directly, view it on GitHub <#20757 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABQNWQPDPKD6HTHX7RHUANLUUJ6T7ANCNFSM5I6Z7PJQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

sven1977 added 5 commits November 26, 2021 16:21

wip.

81abbed

wip.

e95eb05

Merge branch 'master' of https://github.com/ray-project/ray into deco…

723b5b1

…uple_evaluation_and_training

wip

dc0f032

wip.

5b1ad15

sven1977 requested a review from gjoliver December 1, 2021 11:42

sven1977 assigned gjoliver Dec 1, 2021

wip.

313294b

gjoliver reviewed Dec 1, 2021

View reviewed changes

sven1977 added 3 commits December 3, 2021 11:32

wip

199cacc

fix

5831922

Merge branch 'master' of https://github.com/ray-project/ray into deco…

6ddcf99

…uple_evaluation_and_training

gjoliver approved these changes Dec 3, 2021

View reviewed changes

sven1977 added 3 commits December 4, 2021 12:16

fix

e31b546

fix

e6b6b3b

Merge branch 'master' of https://github.com/ray-project/ray into deco…

c382f6c

…uple_evaluation_and_training

sven1977 merged commit 60b2219 into ray-project:master Dec 4, 2021

rfali mentioned this pull request Jan 4, 2022

[Feature][rllib/tune] Deprecate RLLib's rollout/evaluate in favor of tune.run(training=False) #18758

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Allow for evaluation to run by `timesteps` (alternative to `episodes`) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. #20757

[RLlib] Allow for evaluation to run by `timesteps` (alternative to `episodes`) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. #20757

sven1977 commented Nov 29, 2021 •

edited

Loading

gjoliver left a comment

gjoliver Dec 1, 2021

sven1977 Dec 3, 2021

gjoliver Dec 1, 2021

sven1977 Dec 1, 2021

gjoliver Dec 1, 2021

sven1977 Dec 3, 2021

gjoliver Dec 1, 2021

sven1977 Dec 3, 2021

sven1977 Dec 3, 2021

gjoliver Dec 3, 2021

andras-kth Jan 4, 2022

gjoliver Dec 1, 2021

gjoliver Dec 1, 2021

sven1977 Dec 3, 2021

gjoliver Dec 1, 2021

sven1977 Dec 3, 2021

sven1977 commented Dec 3, 2021

gjoliver left a comment

gjoliver commented Dec 3, 2021

rfali commented Jan 4, 2022

gjoliver commented Jan 4, 2022 via email

		@@ -914,19 +952,32 @@ def step_attempt(self) -> ResultDict:
		with concurrent.futures.ThreadPoolExecutor() as executor:
		train_future = executor.submit(

[RLlib] Allow for evaluation to run by timesteps (alternative to episodes) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. #20757

[RLlib] Allow for evaluation to run by timesteps (alternative to episodes) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. #20757

Conversation

sven1977 commented Nov 29, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Dec 3, 2021

gjoliver left a comment

Choose a reason for hiding this comment

gjoliver commented Dec 3, 2021

rfali commented Jan 4, 2022

gjoliver commented Jan 4, 2022 via email

[RLlib] Allow for evaluation to run by `timesteps` (alternative to `episodes`) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. #20757

[RLlib] Allow for evaluation to run by `timesteps` (alternative to `episodes`) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. #20757

sven1977 commented Nov 29, 2021 •

edited

Loading