Inconsistent weight assignment operations in DQNPolicyGraph #4502

vuoristo · 2019-03-28T22:48:03Z

System information

OS Platform and Distribution (e.g., macOS 10.14.3):
Ray installed from (source or binary): source
Ray version: 0.7.0.dev2 ab55a1f
Python version: 3.6.8

Describe the problem

DQNPolicyGraph creates tensorflow assign operations by looping through lists of self.q_func_vars and self.target_q_func_vars sorted by variable name on lines 390:393. The default sorting is not consistent between the two lists of variable names and as a result the operations can mix up the assignments. The attached code snippet produces the error message below.

2019-03-28 18:34:31,402 WARNING worker.py:1397 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.                                                                                                                                                                                                           
2019-03-28 18:34:31.415440: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
WARNING:tensorflow:From /Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/util/decorator_utils.py:127: GraphKeys.VARIABLES (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.                                                                                                                                                                          
Instructions for updating:
Use `tf.GraphKeys.GLOBAL_VARIABLES` instead.
Traceback (most recent call last):
  File "dqn_fail_demonstrator.py", line 37, in <module>
    trainer = DQNAgent(env="CartPole-v0", config=config)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 280, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/tune/trainable.py", line 88, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 377, in _setup
    self._init()
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/dqn/dqn.py", line 207, in _init
    self.env_creator, self._policy_graph)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 510, in make_local_evaluator
    extra_config or {}))
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/agent.py", line 727, in _make_evaluator
    async_remote_worker_envs=config["async_remote_worker_envs"])
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 296, in __init__
    self._build_policy_map(policy_dict, policy_config)  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 692, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/Users/ristovuorio/projects/ray_doodle/ray/python/ray/rllib/agents/dqn/dqn_policy_graph.py", line 394, in __init__
    update_target_expr.append(var_target.assign(var))
  File "/Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 951, in assign
    self._shape.assert_is_compatible_with(value_tensor.shape)
  File "/Users/ristovuorio/miniconda3/envs/ray_fiddle/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (3,) and (11,) are incompatible

Source code / logs

from tensorflow.keras.layers import Dense

import ray
from ray.rllib.models import ModelCatalog, Model
from ray.rllib.agents.dqn import DQNAgent


class DemoNN(Model):
    def _build_layers_v2(self, input_dict, num_outputs, options):
        x = input_dict["obs"]
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(3)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(1)(x)
        x = Dense(11)(x)
        x = Dense(2)(x)

        return x, x


ray.init(local_mode=True)
ModelCatalog.register_custom_model("demo_nn", DemoNN)
config = {
    "model": {
        "custom_model": "demo_nn",
    },
    "hiddens": [],
    "num_workers": 0,
}
trainer = DQNAgent(env="CartPole-v0", config=config)

The text was updated successfully, but these errors were encountered:

ericl · 2019-03-28T23:30:30Z

Would removing the sorted() be the right fix here? According to the TF docs for get_collection(), the variables are already returned in the "order they were collected", which is presumably a consistent order.

vuoristo · 2019-03-28T23:36:56Z

Yup, that seems to fix it.

…y-project#4502)

#4504) * Fixes Inconsistent weight assignment operations in DQNPolicyGraph (#4502) * Update dqn_policy_graph.py

ericl added the bug Something that is supposed to be working; but isn't label Mar 28, 2019

vuoristo added a commit to vuoristo/ray that referenced this issue Mar 28, 2019

Fixes Inconsistent weight assignment operations in DQNPolicyGraph (ra…

b00b1f4

…y-project#4502)

vuoristo mentioned this issue Mar 28, 2019

Fixes Inconsistent weight assignment operations in DQNPolicyGraph (#4… #4504

Merged

ericl closed this as completed in #4504 Mar 29, 2019

ericl pushed a commit that referenced this issue Mar 29, 2019

Fixes Inconsistent weight assignment operations in DQNPolicyGraph (#4… (

798944f

#4504) * Fixes Inconsistent weight assignment operations in DQNPolicyGraph (#4502) * Update dqn_policy_graph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent weight assignment operations in DQNPolicyGraph #4502

Inconsistent weight assignment operations in DQNPolicyGraph #4502

vuoristo commented Mar 28, 2019

ericl commented Mar 28, 2019 •

edited

Loading

vuoristo commented Mar 28, 2019

Inconsistent weight assignment operations in DQNPolicyGraph #4502

Inconsistent weight assignment operations in DQNPolicyGraph #4502

Comments

vuoristo commented Mar 28, 2019

System information

Describe the problem

Source code / logs

ericl commented Mar 28, 2019 • edited Loading

vuoristo commented Mar 28, 2019

ericl commented Mar 28, 2019 •

edited

Loading