-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][rllib/tune] Deprecate RLLib's rollout/evaluate in favor of tune.run(training=False)
#18758
Comments
tune.run(training=False)
tune.run(training=False)
Hey @mehes-kth , any reason why this needs to be done in tune? Why shouldn't we just have an RLlib config flag that disables the training loop? Maybe with a check that evaluation - in this case - would have to be properly setup (evaluation_interval != None, etc..). This is hacky, but a quick workaround for now would be to set in your config:
|
You tell me! 😏 That just seemed more natural, since in RLlib I can already separate BTW, having played around more with
|
BTW, I think I may have confused where the flag was going to be. If I understand correctly, my suggestion to pass Which brings me to another, probably quite naïve, question regarding these settings in your proposed hack:
Do @ericl @richardliaw Here the inline documentation explicitly talks about learning (should be corrected?): Line 75 in 944309c
Similarly, at the very definition, training and SGD are mentioned: Lines 108 to 111 in 944309c
Is this correct? Is @sven1977 wrong? Is Or was it just me jumping to the wrong conclusion that training would be completely avoided using @sven1977 's "workaround"? That's apparently not the case... |
If training=False flag is an RLlib flag, how does Tune take it into consideration? |
In a similar fashion it appears to respect
Indeed! AND, the current
Most definitely, but doing so re-using what already exists and works in Tune sounds most efficient.
Given that |
glad to see we agree on most of things :) let me explain my position: tune is a distributed runner (10% shell) wrapping around a bunch of search algorithms (90% meat). To be clear, this is all about re-purposing Tune, I think adding a single flag to disable training altogether sounds like a very reasonable thing. Should we open a feature request for an actually useful eval.py? I don't think it's hard/slow to do at all. We may just need to copy&paste tune.run code here. |
I'd certainly want reproducibility and seed control, but AFAIK BUT, you certainly have a point: there may, in fact, be a limit how far this can be stretched.
Right.
See... "copy&paste |
For parameter set, I meant the case where I created a large set of starting conditions for my Env. And I want to evaluate multiple policies against this same set of benchmark envs. I think we will copy the trail runner to start. The whole point I am trying to make is that over time, these things will diverge enough to make the effort worth it. |
I finally found some time to test your proposed quick workaround.
This is not only hacky, but it doesn't actually work, either...
|
Ray Tune as a distributed runner does not have a concept of training/evaluation. It is a blackbox optimizer that does not impose any structure on the trainable apart from requiring a point of entry and feedback. This becomes even more apparent when considering function trainables, which only invoke a single function that takes care of running the full training loop, including potential evaluation. Thus, a If this is a common use case for RLLib trainables, it should thus be reflected in a RLLib configuration flag which disables training. By the way, this would presumably be the way it would be implemented anyway - |
In my mind, the two clauses here are in direct contradiction with each other. Evaluation does NOT need an optimizer, blackbox or otherwise, but it could make very good use of a distributed runner... The "hack" outlined in the original issue formulation seems to accomplish this fairly well. |
Yes, as I said Ray Tune does not care about the blackbox it is running, as long as it has an entry point and reports some kind of metric (even that could be omitted). If this is an evaluation function, that's fine. This is however a request specific to certain kinds of trainables - those that do have an evaluation function. We should not generalize from this use case to other blackbox functions. Introducing a config flag to RLLib seems like a reasonable way to enable your use case without hacks. |
Right. I'm not even sure I'd call them BUT, I could easily see how that may fall outside the scope of Tune per se.
I'm fine with that, too. In fact, my proposed code sample does appear to be |
Agreed, the naming is not optimal here - we'll leave it as is for historic reasons for now but might reconsider in the future. Awesome, a RLLib config parameter it is, then. Would you like to try to contribute this yourself? Otherwise we can put this on our backlog and slot this in sometime (cc @avnishn maybe something you could look into?) |
I could try, but I'm afraid that I'd end up with something that institutionalizes the "hack" I found, |
@andras-kth thank you for referencing my post on ray discourse (which ironically still has no replies, but a recent rllib feature should resolve most of my questions). I thought to answer (to the best of my knowledge) a few questions you had raised in this issue (since I still consider myself an rllib beginner).
From this issue, it reads On further digging, I discovered that |
Interesting... That PR is not something I find useful, but I'm glad it helps you. 😺
Thanks! |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
Search before asking
Description
Currently, there's a major gap, or feature disparity, between running standalone evaluations with RLlib's
rollout
command vs. doing the same using the variousevaluation_*
options intune.run
. While the intention of the latter is to allow evaluation during training, with a careful selection of parameters training can be turned off, resulting in a much more complete and flexible approach to evaluation than the somewhat barebones implementation inrollout.py
(or, more recently,evaluate.py
).This feature request targets the introduction of an explicit boolean flag, advantageously called
training
, to allow disabling training without having to fiddle with other parameters, and switch RLlib's evaluation to usetune.run
.If implemented, the following hackish way to disable training:
could be replaced by a single parameter
which would -hopefully- be much more robust, and would NOT rely on apparently deprecated features (
simple_optimizer=True
triggers a deprecation warning, but without that the default Multi-GPU optimizer breaks on the batch size being 0; which shouldn't matter in this case, since no training is needed).Finally, having one proper implementation to perform evaluation instead of two would also reduce user confusion about which one should be used when and why (cf. e.g. https://discuss.ray.io/t/recommended-way-to-evaluate-training-results/2502).
Use case
For a fair comparison of different trained policies, evaluating them under identical conditions is a common scenario.
Instantiating agents from checkpoints and running
rollout
in a loop will result inpredictably with the associated suboptimal performance implications; AND, that's when the implementation happens to work. Some policies/trainers appear to rely on the
episodes
parameter to theircompute_actions
method having been properly initialized, which AFAICT doesn't happen whenrollout
is used, but works fine withtune.run
(cf. #13177).Additionally, spanning parameter ranges can be more compactly expressed as
tune.gridsearch
as opposed to nested loops, where parallelism is usually lost the inner looprollout
"stages" run sequentially.Related issues
As mentioned above:
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: