-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CQL rllib 1.7.2 backport #170
base: releases/1.3.0
Are you sure you want to change the base?
Conversation
…proper train iteration size)
07481dc
to
8598a97
Compare
d09381a
to
acc2dde
Compare
acc2dde
to
e099a1d
Compare
The test passes for me in command line but fails in the pipeline where it fails to locate the json data file.
* set recursive mod 777 on /home/vsts/work/_temp/_bazel_vsts directory prior to build * use $TEST_TMPDIR env variable instead of literal directory name
…o dmlyubim/cql-1.7.2-port
15df96a
to
08bc679
Compare
08bc679
to
84bf8ae
Compare
This reverts commit 84bf8ae.
* set recursive mod 777 on /home/vsts/work/_temp/_bazel_vsts directory prior to build * use $TEST_TMPDIR env variable instead of literal directory name * explicitly set MACOSX_DEPLOYMENT_TARGET env variable * removed minor version of Python; renamed steps to relect correct Python version * get latest pip version to test MacOs wheels * updated hash * undid changes to info,yml * unbounded setuptools * undid change * Fix MacOs version if bdist_wheel generates incorrect MacOS version tag for wheel * undid changes * undid changes * undid changes * force reinstall tune and upstream requirements * updatd CI hash * updated dependencies * updated requirements * updated requirements * updated requirements * updated requirements * updated requirements * updated ci folder hash * updated requirements * updated requirements * updates CI hash * updated requirements * updated requirements * updated requirements * updated requirements * undid requirement changes * updated ci folder hash * updated requirements * updated requirements * updated requirements * updated requirements * updated requirements * updated requirements * updated requirements * updated dependencies * updated requirements * updated dependencies * apt update * fixed GCC download, set Ubuntu 20.04 as default OS for pipeline * updated requirements * updated requirements * fixed setup.py * updated ci hash * fixed setup.py * fixed setup.py * fixed setup.py * updated requirements * fixed setup.py * force reintall of torch and torchvision * updated ci hash * fixed rllib requirements * updated requirements * updated requirements * updated requirements * updated requirements * updated requirements * updated requirements * updated dependencies * updated dependencies * updated requirements * updated requirements * updated requirements * explicitly set locale in MacOS to fix test_signal
CQL_SAC = (cql.CQLSACTrainer, cql.CQLSAC_DEFAULT_CONFIG) | ||
CQL_APEX_SAC = (cql.CQLApexSACTrainer, cql.CQLAPEXSAC_DEFAULT_CONFIG) | ||
CQL_DQN = (cql.CQLDQNTrainer, cql.CQLDQN_DEFAULT_CONFIG) | ||
CQL = (cql.CQLTrainer, cql.CQL_DEFAULT_CONFIG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RLlib documentation mentions that CQL does not support discrete actions. Are we supporting discrete actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think so. this is backported code. I am not sure exactly how rllib hanldes that restriction, but we have ability to restrict it elsewhere in the outer code. I would not deviate from original rllib coding unless absolutely incorrect, makes further backporting merges easier.
action_dist_class = _get_dist_class(policy, policy.config, | ||
policy.action_space) | ||
action_dist_class = _get_dist_class( | ||
# policy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to clean this to avoid confusion later.
[cat.deterministic_sample() for cat in self.cats], axis=1) | ||
if isinstance(self.action_space, gym.spaces.Box): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it is categorical distribution and will be used for discrete action, is this statement valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it is not clear to me why extra dim is required for Box space only but not for others.
|
||
@override(ActionDistribution) | ||
def logp(self, actions: TensorType) -> TensorType: | ||
# If tensor is provided, unstack it into list. | ||
if isinstance(actions, tf.Tensor): | ||
if isinstance(self.action_space, gym.spaces.Box): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
|
||
@staticmethod | ||
@override(ActionDistribution) | ||
def required_model_output_shape( | ||
action_space: gym.Space, | ||
model_config: ModelConfigDict) -> Union[int, np.ndarray]: | ||
return np.sum(action_space.nvec) | ||
# Int Box. | ||
if isinstance(action_space, gym.spaces.Box): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above.
"requires": true, | ||
"packages": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware what this is. Ignoring it. Will suggest to get this reviewed by Ruofan or Kiko.
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.