[rllib] [docs] Absence of learning rate parameter for optimizer in COMMON_CONFIG #4904

konichuvak · 2019-05-30T17:17:09Z

System information

OS Platform and Distribution: Linux Ubuntu 16.04
Ray installed from: binary
Ray version : 0.8.0.dev0
Python version: python3.6

Problem Description:

In the doc section of concepts https://ray.readthedocs.io/en/latest/rllib-concepts.html#building-policies-in-tensorflow a sample example throws an error due to the absence of a learning rate for an optimizer

Source code:

import tensorflow as tf
import ray
from ray import tune
from ray.rllib.agents.trainer_template import build_trainer
from ray.rllib.policy.sample_batch import SampleBatch
from ray.rllib.policy.tf_policy_template import build_tf_policy


def policy_gradient_loss(policy, batch_tensors):
    actions = batch_tensors[SampleBatch.ACTIONS]
    rewards = batch_tensors[SampleBatch.REWARDS]
    return -tf.reduce_mean(policy.action_dist.logp(actions) * rewards)


# <class 'ray.rllib.policy.tf_policy_template.MyTFPolicy'>
MyTFPolicy = build_tf_policy(
    name="MyTFPolicy",
    loss_fn=policy_gradient_loss,
)

# <class 'ray.rllib.agents.trainer_template.MyCustomTrainer'>
MyTrainer = build_trainer(
    name="MyCustomTrainer",
    default_policy=MyTFPolicy,
)

ray.init()
tune.run(
    MyTrainer,
    config={
        "env"        : "CartPole-v0",
        "num_workers": 2,
    }
)

Full traceback:

Traceback (most recent call last):
  File "/home/ubuntu/ray/python/ray/tune/trial_runner.py", line 446, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/ubuntu/ray/python/ray/tune/ray_trial_executor.py", line 316, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/ubuntu/ray/python/ray/worker.py", line 2189, in get
    raise value
ray.exceptions.RayTaskError: ray_MyCustomTrainer:train() (pid=8300, host=...)
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer.py", line 311, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/home/ubuntu/ray/python/ray/tune/trainable.py", line 88, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer.py", line 424, in _setup
    self._init(self.config, self.env_creator)
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer_template.py", line 63, in _init
    env_creator, policy)
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer.py", line 622, in make_local_evaluator
    extra_config or {}))
  File "/home/ubuntu/ray/python/ray/rllib/agents/trainer.py", line 847, in _make_evaluator
    _fake_sampler=config.get("_fake_sampler", False))
  File "/home/ubuntu/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 321, in __init__
    self._build_policy_map(policy_dict, policy_config)
  File "/home/ubuntu/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 727, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/home/ubuntu/ray/python/ray/rllib/policy/tf_policy_template.py", line 109, in __init__
    existing_inputs=existing_inputs)
  File "/home/ubuntu/ray/python/ray/rllib/policy/dynamic_tf_policy.py", line 159, in __init__
    self._initialize_loss()
  File "/home/ubuntu/ray/python/ray/rllib/policy/dynamic_tf_policy.py", line 272, in _initialize_loss
    TFPolicy._initialize_loss(self, loss, loss_inputs)
  File "/home/ubuntu/ray/python/ray/rllib/policy/tf_policy.py", line 154, in _initialize_loss
    self._optimizer = self.optimizer()
  File "/home/ubuntu/ray/python/ray/rllib/policy/tf_policy_template.py", line 129, in optimizer
    return TFPolicy.optimizer(self)
  File "/home/ubuntu/ray/python/ray/rllib/policy/tf_policy.py", line 287, in optimizer
    return tf.train.AdamOptimizer(self.config["lr"])
KeyError: 'lr'

Possible solution:

Edit the example in the docs:

from ray.rllib.agents.trainer import COMMON_CONFIG

COMMON_CONFIG['lr'] = 0.01
MyTrainer = build_trainer(
    name="MyCustomTrainer",
    default_policy=MyTFPolicy,
    default_config=COMMON_CONFIG
)

Another fix would be to edit COMMON_CONFIG in ray.rllib.agents.trainer to include the learning rate key.

The text was updated successfully, but these errors were encountered:

ericl · 2019-05-30T18:05:04Z

Hm that's unfortunate. It could be the change was mislayed in some other PR not yet merged.

ericl self-assigned this May 30, 2019

ericl mentioned this issue May 31, 2019

[rllib] Fix documentation on custom policies #4910

Merged

1 task

ericl closed this as completed in #4910 Jun 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] [docs] Absence of learning rate parameter for optimizer in COMMON_CONFIG #4904

[rllib] [docs] Absence of learning rate parameter for optimizer in COMMON_CONFIG #4904

konichuvak commented May 30, 2019

ericl commented May 30, 2019 •

edited

Loading

[rllib] [docs] Absence of learning rate parameter for optimizer in COMMON_CONFIG #4904

[rllib] [docs] Absence of learning rate parameter for optimizer in COMMON_CONFIG #4904

Comments

konichuvak commented May 30, 2019

System information

Problem Description:

Source code:

Full traceback:

Possible solution:

ericl commented May 30, 2019 • edited Loading

ericl commented May 30, 2019 •

edited

Loading