Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Eps greedy ope #28837

Merged
merged 25 commits into from
Sep 29, 2022
Merged

Conversation

kouroshHakha
Copy link
Contributor

Why are these changes needed?

feature requested :)

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

… OPE and feature importance

2. introduced estimate_multi_step vs. estimate_single_step

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

new_prob = np.exp(convert_to_numpy(log_likelihoods))

if self.epsilon_greedy > 0.0:
if not hasattr(self.policy.action_space, "n"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issubclass(self.policy.action_space, (Discrete, MultiDiscrete))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want this to depend on those classes. The minimum abstraction needs is that it should have an n attribute. This is more flexible for down the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, is this a public spec though?
a general principle is not to rely on internal implementation details. for example, MultiBinary also has member variable named n ...
also, I kind of feel like we need to check for policy.action_space. original_space first, something like that, in case people are doing action space flattening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point on MultiBinary. So this only works for simple discrete action space. Should we just assume the spaces would be gym.Space?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, not ideal, but it seems to be everywhere already, so probably fine:
https://github.com/ray-project/ray/blob/master/rllib/algorithms/appo/appo_tf_policy.py#L167

@@ -91,6 +192,21 @@ def check_action_prob_in_batch(self, batch: SampleBatchType) -> None:
"`off_policy_estimation_methods: {}` to disable estimation."
)

def _compute_action_probs(self, batch: SampleBatch):
log_likelihoods = compute_log_likelihoods_from_input_dict(self.policy, batch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put these 2 lines in an else clause so we don't spend time on them if epsilon greedy is specified?

Copy link
Contributor Author

@kouroshHakha kouroshHakha Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two should run regardless of eps? you modify new_prob in case of epsilon greedy?

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. tests look good too.

@gjoliver gjoliver merged commit c1e0d39 into ray-project:master Sep 29, 2022
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
* 1. Introduced new abstraction: OfflineEvaluator that is the parent of OPE and feature importance
2. introduced estimate_multi_step vs. estimate_single_step

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* algorithm ope evaluation is now able to skip split_by_episode

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* lint

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* lint

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* fixed some unittests

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* added eps greedy exploration to ope methods

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* wip

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* lint

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* wip

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* wip

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* fixed dm and dr variance issues

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* lint

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* cleaned up the inheritance

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* lint

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* lint

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* fixed test

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* nit

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* fixed nits

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* fixed the typos

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* nit

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* wip

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

* wip

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants