-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix ope speed #28834
[RLlib] Fix ope speed #28834
Changes from all commits
d68327a
c9f82c7
601e92b
676babd
7f0983a
5e4d9e0
e4e53f6
34cd602
ddf2910
240e2be
33401da
e340b25
80bb48b
b1db2ec
b576b83
7d59e37
0c3d09d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -180,6 +180,7 @@ def __init__(self, algo_class=None): | |
self.evaluation_parallel_to_training = False | ||
self.evaluation_config = {} | ||
self.off_policy_estimation_methods = {} | ||
self.ope_split_batch_by_episode = True | ||
self.evaluation_num_workers = 0 | ||
self.custom_evaluation_function = None | ||
self.always_attach_evaluation_results = False | ||
|
@@ -827,6 +828,7 @@ def evaluation( | |
Union["AlgorithmConfig", PartialAlgorithmConfigDict] | ||
] = None, | ||
off_policy_estimation_methods: Optional[Dict] = None, | ||
ope_split_batch_by_episode: Optional[bool] = None, | ||
evaluation_num_workers: Optional[int] = None, | ||
custom_evaluation_function: Optional[Callable] = None, | ||
always_attach_evaluation_results: Optional[bool] = None, | ||
|
@@ -880,6 +882,11 @@ def evaluation( | |
You can also add additional config arguments to be passed to the | ||
OffPolicyEstimator in the dict, e.g. | ||
{"qreg_dr": {"type": DoublyRobust, "q_model_type": "qreg", "k": 5}} | ||
ope_split_batch_by_episode: Whether to use SampleBatch.split_by_episode() to | ||
split the input batch to episodes before estimating the ope metrics. In | ||
case of bandits you should make this False to see improvements in ope | ||
evaluation speed. In case of bandits, it is ok to not split by episode, | ||
since each record is one timestep already. The default is True. | ||
evaluation_num_workers: Number of parallel workers to use for evaluation. | ||
Note that this is set to zero by default, which means evaluation will | ||
be run in the algorithm process (only if evaluation_interval is not | ||
|
@@ -925,10 +932,12 @@ def evaluation( | |
self.evaluation_num_workers = evaluation_num_workers | ||
if custom_evaluation_function is not None: | ||
self.custom_evaluation_function = custom_evaluation_function | ||
if always_attach_evaluation_results: | ||
if always_attach_evaluation_results is not None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed that these are buggy. i.e. previously if always_attach_evaluation_results was set to False and by default it was true this call would not have overriden it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You have to explicitly check if these variables are not None otherwise False would also not get assigned There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
self.always_attach_evaluation_results = always_attach_evaluation_results | ||
if enable_async_evaluation: | ||
if enable_async_evaluation is not None: | ||
self.enable_async_evaluation = enable_async_evaluation | ||
if ope_split_batch_by_episode is not None: | ||
self.ope_split_batch_by_episode = ope_split_batch_by_episode | ||
|
||
return self | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,8 @@ | |
from ray.rllib.offline.output_writer import OutputWriter, NoopOutput | ||
from ray.rllib.offline.resource import get_offline_io_resource_bundles | ||
from ray.rllib.offline.shuffled_input import ShuffledInput | ||
from ray.rllib.offline.feature_importance import FeatureImportance | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moving feature importance out of it's previous place because it doesn't really fit the definition of off-policy evaluation in literature. It is now a sub-class of OfflineEvaluator which OffPolicyEstimator is also a sub-class of. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should be a separate pr There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is not done here it will break the feature_importance code. |
||
|
||
|
||
__all__ = [ | ||
"IOContext", | ||
|
@@ -24,4 +26,5 @@ | |
"DatasetWriter", | ||
"get_dataset_and_shards", | ||
"get_offline_io_resource_bundles", | ||
"FeatureImportance", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't concatenate and then split by episode here. In case of bandits we don't even need to split_by_episodes since each rows is already one episode. In case of RL, each batch is already ended at some episode so we gain nothing by concating all batches together.