-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Compute actions with AlphaZero algorithm #13177
Comments
cc @sven1977 |
Same issue Code Snippet
Error Message
|
Good day. |
@lairning |
@mehes-kth |
I think I found the problem. The What does work (mostly) is to use the much more "correct" and flexible evaluation option in from ray import rllib, tune
from ray.tune.utils.trainable import TrainableUtil
agent_type = 'contrib/AlphaZero'
checkpoint_dir = ...
config = ...
# evaluation ONLY: avoid MultiGPU optimizer, set all relevant sizes to 0
config.update(
simple_optimizer=True,
num_workers=0,
train_batch_size=0,
rollout_fragment_length=0,
timesteps_per_iteration=0,
evaluation_interval=1,
# evaluation_num_workers=...,
# evaluation_config=dict(explore=False),
# evaluation_num_episodes=...,
)
agent = rllib.agents.registry.get_trainer_class(agent_type)(config=config)
# may need adjustment depending on checkpoint frequency
checkpoint_path = TrainableUtil.get_checkpoints_paths(checkpoint_dir).chkpt_path[0]
agent.restore(checkpoint_path)
results = tune.run(
agent,
config=config,
...
) I think |
See #14477 (specifically, #14477 (comment)) for another approach... |
Hi,
After the training of an AlphaZero trainer in a environment i have tried to load it and evaluate it but when i use the compute_action command in order to compute the action based on the current observation i get the following error:
Traceback (most recent call last):
File "C:/#######################/rllib/AlphaZero_Trainer.py", line 99, in
action = alphazero_trainer.compute_action(observation=obs)
File "C:#######################\ray\rllib\agents\trainer.py", line 830, in compute_action
timestep=self.global_vars["timestep"])
File "C:#######################\ray\rllib\policy\policy.py", line 194, in compute_single_action
timestep=timestep)
File "C:#######################\ray\rllib\contrib\alpha_zero\core\alpha_zero_policy.py", line 50, in compute_actions
for i, episode in enumerate(episodes):
TypeError: 'NoneType' object is not iterable
I used the same command for ppo, IMPALA and a2c trainers and it worked fine. Am I missing anything?
Thanks in advance!
The text was updated successfully, but these errors were encountered: