Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] [docs] Absence of learning rate parameter for optimizer in COMMON_CONFIG #4904

Closed
konichuvak opened this issue May 30, 2019 · 1 comment · Fixed by #4910
Closed
Assignees

Comments

@konichuvak
Copy link
Contributor

System information

  • OS Platform and Distribution: Linux Ubuntu 16.04
  • Ray installed from: binary
  • Ray version : 0.8.0.dev0
  • Python version: python3.6

Problem Description:

In the doc section of concepts https://ray.readthedocs.io/en/latest/rllib-concepts.html#building-policies-in-tensorflow a sample example throws an error due to the absence of a learning rate for an optimizer

Source code:

import tensorflow as tf
import ray
from ray import tune
from ray.rllib.agents.trainer_template import build_trainer
from ray.rllib.policy.sample_batch import SampleBatch
from ray.rllib.policy.tf_policy_template import build_tf_policy


def policy_gradient_loss(policy, batch_tensors):
    actions = batch_tensors[SampleBatch.ACTIONS]
    rewards = batch_tensors[SampleBatch.REWARDS]
    return -tf.reduce_mean(policy.action_dist.logp(actions) * rewards)


# <class 'ray.rllib.policy.tf_policy_template.MyTFPolicy'>
MyTFPolicy = build_tf_policy(
    name="MyTFPolicy",
    loss_fn=policy_gradient_loss,
)

# <class 'ray.rllib.agents.trainer_template.MyCustomTrainer'>
MyTrainer = build_trainer(
    name="MyCustomTrainer",
    default_policy=MyTFPolicy,
)

ray.init()
tune.run(
    MyTrainer,
    config={
        "env"        : "CartPole-v0",
        "num_workers": 2,
    }
)

Full traceback:

Traceback (most recent call last):
  File "/home/ubuntu/ray/python/ray/tune/trial_runner.py", line 446, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/ubuntu/ray/python/ray/tune/ray_trial_executor.py", line 316, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/ubuntu/ray/python/ray/worker.py", line 2189, in get
    raise value
ray.exceptions.RayTaskError: ray_MyCustomTrainer:train() (pid=8300, host=...)
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer.py", line 311, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/home/ubuntu/ray/python/ray/tune/trainable.py", line 88, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer.py", line 424, in _setup
    self._init(self.config, self.env_creator)
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer_template.py", line 63, in _init
    env_creator, policy)
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer.py", line 622, in make_local_evaluator
    extra_config or {}))
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer.py", line 847, in _make_evaluator
    _fake_sampler=config.get("_fake_sampler", False))
  File "/home/ubuntu/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 321, in __init__
    self._build_policy_map(policy_dict, policy_config)
  File "/home/ubuntu/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 727, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/home/ubuntu/ray/python/ray/rllib/policy/tf_policy_template.py", line 109, in __init__
    existing_inputs=existing_inputs)
  File "/home/ubuntu/ray/python/ray/rllib/policy/dynamic_tf_policy.py", line 159, in __init__
    self._initialize_loss()
  File "/home/ubuntu/ray/python/ray/rllib/policy/dynamic_tf_policy.py", line 272, in _initialize_loss
    TFPolicy._initialize_loss(self, loss, loss_inputs)
  File "/home/ubuntu/ray/python/ray/rllib/policy/tf_policy.py", line 154, in _initialize_loss
    self._optimizer = self.optimizer()
  File "/home/ubuntu/ray/python/ray/rllib/policy/tf_policy_template.py", line 129, in optimizer
    return TFPolicy.optimizer(self)
  File "/home/ubuntu/ray/python/ray/rllib/policy/tf_policy.py", line 287, in optimizer
    return tf.train.AdamOptimizer(self.config["lr"])
KeyError: 'lr'

Possible solution:

Edit the example in the docs:

from ray.rllib.agents.trainer import COMMON_CONFIG

COMMON_CONFIG['lr'] = 0.01
MyTrainer = build_trainer(
    name="MyCustomTrainer",
    default_policy=MyTFPolicy,
    default_config=COMMON_CONFIG
)

Another fix would be to edit COMMON_CONFIG in ray.rllib.agents.trainer to include the learning rate key.

@ericl
Copy link
Contributor

ericl commented May 30, 2019

Hm that's unfortunate. It could be the change was mislayed in some other PR not yet merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants