[rllib] Compute actions with AlphaZero algorithm #13177

DoxakisCh · 2021-01-04T22:55:40Z

Hi,

After the training of an AlphaZero trainer in a environment i have tried to load it and evaluate it but when i use the compute_action command in order to compute the action based on the current observation i get the following error:

Traceback (most recent call last):
File "C:/#######################/rllib/AlphaZero_Trainer.py", line 99, in
action = alphazero_trainer.compute_action(observation=obs)
File "C:#######################\ray\rllib\agents\trainer.py", line 830, in compute_action
timestep=self.global_vars["timestep"])
File "C:#######################\ray\rllib\policy\policy.py", line 194, in compute_single_action
timestep=timestep)
File "C:#######################\ray\rllib\contrib\alpha_zero\core\alpha_zero_policy.py", line 50, in compute_actions
for i, episode in enumerate(episodes):
TypeError: 'NoneType' object is not iterable

I used the same command for ppo, IMPALA and a2c trainers and it worked fine. Am I missing anything?

Thanks in advance!

richardliaw · 2021-01-14T23:25:20Z

cc @sven1977

lairning · 2021-01-30T15:38:36Z

Same issue

Code Snippet

import argparse

import ray
from ray.tune.registry import register_env
from ray.rllib.contrib.alpha_zero.models.custom_torch_models import DenseModel
from ray.rllib.contrib.alpha_zero.environments.cartpole import CartPole
from ray.rllib.contrib.alpha_zero.core.alpha_zero_trainer import AlphaZeroTrainer
from ray.rllib.models.catalog import ModelCatalog

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--training-iteration", default=2, type=int)
    args = parser.parse_args()
    ray.init()

    ModelCatalog.register_custom_model("dense_model", DenseModel)
    register_env("CartPoleEnv", lambda _: CartPole())

    config = {
        "num_workers"       : 0,
        "rollout_fragment_length": 50,
        "train_batch_size"  : 500,
        "sgd_minibatch_size": 64,
        "lr"                : 1e-4,
        "num_sgd_iter"      : 1,
        "mcts_config"       : {
            "puct_coefficient"   : 1.5,
            "num_simulations"    : 100,
            "temperature"        : 1.0,
            "dirichlet_epsilon"  : 0.20,
            "dirichlet_noise"    : 0.03,
            "argmax_tree_policy" : False,
            "add_dirichlet_noise": True,
        },
        "ranked_rewards"    : {
            "enable": True,
        },
        "model"             : {
            "custom_model": "dense_model",
        },
    }

    agent = AlphaZeroTrainer(config=config, env="CartPoleEnv")

    for _ in range(args.training_iteration):
        agent.train()

    env = CartPole()
    episode_reward = 0
    done = False
    obs = env.reset
    while not done:
        print(obs)
        action = agent.compute_action(obs)
        obs, episode_reward, done, info = env.step(action)

    print(episode_reward)

    ray.shutdown()

I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.

Error Message

Traceback (most recent call last):
  File "alpha0_err1.py", line 58, in <module>
    action = agent.compute_action(obs)
  File "/home/md/miniconda3/envs/simpy/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 819, in compute_action
    policy_id].transform(observation)
  File "/home/md/miniconda3/envs/simpy/lib/python3.7/site-packages/ray/rllib/models/preprocessors.py", line 240, in transform
    self.write(observation, array, 0)
  File "/home/md/miniconda3/envs/simpy/lib/python3.7/site-packages/ray/rllib/models/preprocessors.py", line 247, in write
    observation = OrderedDict(sorted(observation.items()))
AttributeError: 'function' object has no attribute 'items'

KiborgBBK · 2021-02-22T21:31:31Z

Good day.
I would like to know if there is still an answer on this topic? I'm also suffering with this (action = agent.compute_action (obs)), then I get exactly the same error then (ValueError: ('Observation ({}) outside given space ({})!')
thanks in advance for the answer

andras-kth · 2021-09-16T09:41:34Z

@lairning
Have you tried changing obs = env.reset to obs = env.reset()?

lairning · 2021-09-19T15:39:04Z

@mehes-kth
Thank you for spotting the bug!

andras-kth · 2021-09-19T23:12:52Z

Hi,

After the training of an AlphaZero trainer in a environment i have tried to load it and evaluate it but when i use the compute_action command in order to compute the action based on the current observation i get the following error:

Traceback (most recent call last):
File "C:/#######################/rllib/AlphaZero_Trainer.py", line 99, in
action = alphazero_trainer.compute_action(observation=obs)
File "C:#######################\ray\rllib\agents\trainer.py", line 830, in compute_action
timestep=self.global_vars["timestep"])
File "C:#######################\ray\rllib\policy\policy.py", line 194, in compute_single_action
timestep=timestep)
File "C:#######################\ray\rllib\contrib\alpha_zero\core\alpha_zero_policy.py", line 50, in compute_actions
for i, episode in enumerate(episodes):
TypeError: 'NoneType' object is not iterable

I used the same command for ppo, IMPALA and a2c trainers and it worked fine. Am I missing anything?

Thanks in advance!

I think I found the problem. The AlphaZeroTrainer has a callback that overrides on_episode_start,
which is likely not called from the code that fails. BTW, I've been trying to use rllib.rollout.rollout,
and that fails the exact same way, since that doesn't call new_episode (which in turn should invoke
the callback), either... That's on 1.6.0. I see that on the master branch rollout is updated to use
.evaluate; but upon closer inspection that's largely a simple name-change, no improvement in code,
meaning that it will also fail the same way.

What does work (mostly) is to use the much more "correct" and flexible evaluation option in tune.run.
Doing so, does take a bit of fiddling with the configuration to effectively disable training (and currently
that only works with the "deprecated" simple_optimizer), but here's what I cooked up, in case, anyone
finds it useful:

from ray import rllib, tune
from ray.tune.utils.trainable import TrainableUtil

agent_type = 'contrib/AlphaZero'
checkpoint_dir = ...
config = ...

# evaluation ONLY: avoid MultiGPU optimizer, set all relevant sizes to 0
config.update(
    simple_optimizer=True,
    num_workers=0,
    train_batch_size=0,
    rollout_fragment_length=0,
    timesteps_per_iteration=0,
    evaluation_interval=1,
    # evaluation_num_workers=...,
    # evaluation_config=dict(explore=False),
    # evaluation_num_episodes=...,
)

agent = rllib.agents.registry.get_trainer_class(agent_type)(config=config)
# may need adjustment depending on checkpoint frequency
checkpoint_path = TrainableUtil.get_checkpoints_paths(checkpoint_dir).chkpt_path[0]
agent.restore(checkpoint_path)

results = tune.run(
    agent,
    config=config,
    ...
)

I think tune.run should provide a simpler way, say a train=False boolean, to disable training.
That should also work for the MultiGPU version, which currently breaks with train_batch_size=0.

andras-kth · 2021-09-26T19:32:25Z

See #14477 (specifically, #14477 (comment)) for another approach...

DoxakisCh added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 4, 2021

richardliaw added P2 Important issue, but not time-critical windows and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 14, 2021

lairning mentioned this issue Jan 30, 2021

[rllib] AlphaZero brokes in compute_action() #13822

Closed

2 tasks

KiborgBBK mentioned this issue Feb 23, 2021

[rllib] ValueError: ('Observation ({}) outside given space ({})!' for an inside the space observation #12940

Closed

andras-kth mentioned this issue Sep 20, 2021

[Feature][rllib/tune] Deprecate RLLib's rollout/evaluate in favor of tune.run(training=False) #18758

Closed

2 tasks

richardliaw added the rllib RLlib related issues label Oct 5, 2021

avnishn closed this as completed Apr 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] Compute actions with AlphaZero algorithm #13177

[rllib] Compute actions with AlphaZero algorithm #13177

DoxakisCh commented Jan 4, 2021

richardliaw commented Jan 14, 2021

lairning commented Jan 30, 2021

KiborgBBK commented Feb 22, 2021

andras-kth commented Sep 16, 2021

lairning commented Sep 19, 2021

andras-kth commented Sep 19, 2021 •

edited

Loading

andras-kth commented Sep 26, 2021

[rllib] Compute actions with AlphaZero algorithm #13177

[rllib] Compute actions with AlphaZero algorithm #13177

Comments

DoxakisCh commented Jan 4, 2021

richardliaw commented Jan 14, 2021

lairning commented Jan 30, 2021

Code Snippet

Error Message

KiborgBBK commented Feb 22, 2021

andras-kth commented Sep 16, 2021

lairning commented Sep 19, 2021

andras-kth commented Sep 19, 2021 • edited Loading

andras-kth commented Sep 26, 2021

andras-kth commented Sep 19, 2021 •

edited

Loading