Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQL rllib 1.7.2 backport #170

Open
wants to merge 29 commits into
base: releases/1.3.0
Choose a base branch
from
Open

Conversation

dmlyubim
Copy link

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@dmlyubim dmlyubim requested a review from a team as a code owner January 11, 2023 01:07
@dmlyubim dmlyubim requested review from RuofanKong and removed request for a team January 11, 2023 01:07
@dmlyubim dmlyubim changed the title Dmlyubim/cql 1.7.2 port CQL rllib 1.7.2 backport Jan 11, 2023
@dmlyubim dmlyubim requested a review from a team as a code owner January 12, 2023 21:53
dmlyubim and others added 6 commits January 12, 2023 17:25
The test passes for me in command line but fails in the pipeline where it
fails to locate the json data file.
* set recursive mod 777 on /home/vsts/work/_temp/_bazel_vsts directory prior to build

* use $TEST_TMPDIR env variable instead of literal directory name
dmlyubim and others added 3 commits January 30, 2023 16:10
* set recursive mod 777 on /home/vsts/work/_temp/_bazel_vsts directory prior to build

* use $TEST_TMPDIR env variable instead of literal directory name

* explicitly set MACOSX_DEPLOYMENT_TARGET env variable

* removed minor version of Python; renamed steps to relect correct Python version

* get latest pip version to test MacOs wheels

* updated hash

* undid changes to info,yml

* unbounded setuptools

* undid change

* Fix MacOs version if bdist_wheel generates incorrect MacOS version tag for wheel

* undid changes

* undid changes

* undid changes

* force reinstall tune and upstream requirements

* updatd CI hash

* updated dependencies

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated ci folder hash

* updated requirements

* updated requirements

* updates CI hash

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* undid requirement changes

* updated ci folder hash

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated dependencies

* updated requirements

* updated dependencies

* apt update

* fixed GCC download, set Ubuntu 20.04 as default OS for pipeline

* updated requirements

* updated requirements

* fixed setup.py

* updated ci hash

* fixed setup.py

* fixed setup.py

* fixed setup.py

* updated requirements

* fixed setup.py

* force reintall of torch and torchvision

* updated ci hash

* fixed rllib requirements

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated requirements

* updated dependencies

* updated dependencies

* updated requirements

* updated requirements

* updated requirements

* explicitly set locale in MacOS to fix test_signal
CQL_SAC = (cql.CQLSACTrainer, cql.CQLSAC_DEFAULT_CONFIG)
CQL_APEX_SAC = (cql.CQLApexSACTrainer, cql.CQLAPEXSAC_DEFAULT_CONFIG)
CQL_DQN = (cql.CQLDQNTrainer, cql.CQLDQN_DEFAULT_CONFIG)
CQL = (cql.CQLTrainer, cql.CQL_DEFAULT_CONFIG)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RLlib documentation mentions that CQL does not support discrete actions. Are we supporting discrete actions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think so. this is backported code. I am not sure exactly how rllib hanldes that restriction, but we have ability to restrict it elsewhere in the outer code. I would not deviate from original rllib coding unless absolutely incorrect, makes further backporting merges easier.

action_dist_class = _get_dist_class(policy, policy.config,
policy.action_space)
action_dist_class = _get_dist_class(
# policy,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to clean this to avoid confusion later.

[cat.deterministic_sample() for cat in self.cats], axis=1)
if isinstance(self.action_space, gym.spaces.Box):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it is categorical distribution and will be used for discrete action, is this statement valid?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is not clear to me why extra dim is required for Box space only but not for others.


@override(ActionDistribution)
def logp(self, actions: TensorType) -> TensorType:
# If tensor is provided, unstack it into list.
if isinstance(actions, tf.Tensor):
if isinstance(self.action_space, gym.spaces.Box):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above


@staticmethod
@override(ActionDistribution)
def required_model_output_shape(
action_space: gym.Space,
model_config: ModelConfigDict) -> Union[int, np.ndarray]:
return np.sum(action_space.nvec)
# Int Box.
if isinstance(action_space, gym.spaces.Box):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above.

"requires": true,
"packages": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware what this is. Ignoring it. Will suggest to get this reviewed by Ruofan or Kiko.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants