Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent weight assignment operations in DQNPolicyGraph #4502

Closed
vuoristo opened this issue Mar 28, 2019 · 2 comments · Fixed by #4504
Closed

Inconsistent weight assignment operations in DQNPolicyGraph #4502

vuoristo opened this issue Mar 28, 2019 · 2 comments · Fixed by #4504
Labels
bug Something that is supposed to be working; but isn't

Comments

@vuoristo
Copy link
Contributor

System information

  • OS Platform and Distribution (e.g., macOS 10.14.3):
  • Ray installed from (source or binary): source
  • Ray version: 0.7.0.dev2 ab55a1f
  • Python version: 3.6.8

Describe the problem

DQNPolicyGraph creates tensorflow assign operations by looping through lists of self.q_func_vars and self.target_q_func_vars sorted by variable name on lines 390:393. The default sorting is not consistent between the two lists of variable names and as a result the operations can mix up the assignments. The attached code snippet produces the error message below.

2019-03-28 18:34:31,402 WARNING worker.py:1397 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.                                                                                                                                                                                                           
2019-03-28 18:34:31.415440: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
WARNING:tensorflow:From /Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/util/decorator_utils.py:127: GraphKeys.VARIABLES (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.                                                                                                                                                                          
Instructions for updating:
Use `tf.GraphKeys.GLOBAL_VARIABLES` instead.
Traceback (most recent call last):
  File "dqn_fail_demonstrator.py", line 37, in <module>
    trainer = DQNAgent(env="CartPole-v0", config=config)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 280, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/tune/trainable.py", line 88, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 377, in _setup
    self._init()
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/dqn/dqn.py", line 207, in _init
    self.env_creator, self._policy_graph)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 510, in make_local_evaluator
    extra_config or {}))
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 727, in _make_evaluator
    async_remote_worker_envs=config["async_remote_worker_envs"])
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 296, in __init__
    self._build_policy_map(policy_dict, policy_config)  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 692, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/dqn/dqn_policy_graph.py", line 394, in __init__
    update_target_expr.append(var_target.assign(var))
  File "/Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 951, in assign
    self._shape.assert_is_compatible_with(value_tensor.shape)
  File "/Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (3,) and (11,) are incompatible

Source code / logs

from tensorflow.keras.layers import Dense

import ray
from ray.rllib.models import ModelCatalog, Model
from ray.rllib.agents.dqn import DQNAgent


class DemoNN(Model):
    def _build_layers_v2(self, input_dict, num_outputs, options):
        x = input_dict["obs"]
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(3)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(11)(x)
        x = Dense(2)(x)

        return x, x


ray.init(local_mode=True)
ModelCatalog.register_custom_model("demo_nn", DemoNN)
config = {
    "model": {
        "custom_model": "demo_nn",
    },
    "hiddens": [],
    "num_workers": 0,
}
trainer = DQNAgent(env="CartPole-v0", config=config)
@ericl
Copy link
Contributor

ericl commented Mar 28, 2019

Would removing the sorted() be the right fix here? According to the TF docs for get_collection(), the variables are already returned in the "order they were collected", which is presumably a consistent order.

@ericl ericl added the bug Something that is supposed to be working; but isn't label Mar 28, 2019
@vuoristo
Copy link
Contributor Author

Yup, that seems to fix it.

ericl pushed a commit that referenced this issue Mar 29, 2019
#4504)

* Fixes Inconsistent weight assignment operations in DQNPolicyGraph (#4502)

* Update dqn_policy_graph.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants