[Bug Report] actor's std becomes "nan" during PPO training #33

mitsu3291 · 2024-07-11T09:12:00Z

I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While it works fine with simple settings, when I switch to more complex settings (such as Domain Randomization), the following error occurs during training（After some progress in training）, indicating that the actor's standard deviation does not meet the condition of being ≥ 0. Has anyone experienced a similar error?
num_env is 3600

Traceback (most recent call last):
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 131, in <module>
    main()
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 123, in main
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 153, in learn
    mean_value_loss, mean_surrogate_loss = self.alg.update()
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/algorithms/ppo.py", line 121, in update
    self.actor_critic.act(obs_batch, masks=masks_batch, hidden_states=hid_states_batch[0])
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic.py", line 105, in act
    
  File "/isaac-sim/exts/omni.isaac.ml_archive/pip_prebundle/torch/distributions/normal.py", line 74, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))  
RuntimeError: normal expects all elements of std >= 0.0

I investigated the value of std(self.scale) and found that the std value in a certain environment appears to be nan. (The number of columns represents the action dimensions for the robot.)

self.scale: tensor([[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
...,
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500]],
device='cuda:0')
env_id: 1111, row: tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       device='cuda:0')

The text was updated successfully, but these errors were encountered:

Lruomeng · 2024-07-12T13:31:03Z

me too

shafeef901 · 2024-07-15T08:40:35Z

Can confirm I've experienced it too. In my case, I had introduced some sparse rewards to my environment. Not sure that's the cause tho.

felipemohr · 2024-07-21T18:56:15Z

Same problem here. When visualizing the training data in tensorboard, I notice that Loss/value_function suddenly goes to infinity

xliu0105 · 2024-07-27T08:15:19Z

Same problem

xliu0105 · 2024-07-27T09:22:53Z

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

Thanks for your answer, I'll try it out

AlexanderAbernathy · 2024-07-27T09:32:39Z

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

AlexanderAbernathy · 2024-07-29T02:59:10Z

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

Note this kind of method may not work and it may reduce the learning speed. I've tested it using parameters 'iteration : 30000' & 'num_envs : 12000 to 30000' for training my own robot. The training process randomly failed between 1,000 and 18,000 iterations. I've checked the 'value batch' and 'return batch'. Once the training failed, these two values showed very large positive or negative numbers. I ultimately completed the entire training process by modifying the reward and penalty. Since I'm still fresh to the RL, I don't know exactly what happened. By the way, I've tested modifying the hyperparameters of PPO and the structure of networks. It didn't work. I would greatly appreciate it if someone could provide some information on this topic.

There is an unsuitable method to ensure the training proceeds. When std >= 0 and the Value Function Loss shows inf, you can first adjust some parameters in the project and then use --resume to load the checkpoint and continue training.

weifeng-lt · 2024-08-15T01:50:59Z

Adding code actions = torch.clip(actions, min=-6.28, max=6.28) before env.step(actions) seems to help. And it is better to add a penalty to actions to prevent the actor model from outputting too large values.

mitsu3291 changed the title ~~Actor's Standard Deviation std Not Meeting the Condition of Being ≥ 0 During PPO Training~~ [Bug Report] actor's std not meeting the "≥ 0" during PPO training Jul 12, 2024

mitsu3291 changed the title ~~[Bug Report] actor's std not meeting the "≥ 0" during PPO training~~ [Bug Report] actor's std becomes "nan" during PPO training Jul 12, 2024

weifeng-lt mentioned this issue Jul 23, 2024

[Bug Report] with rsl_rl, actor's std becomes "nan" during PPO training isaac-sim/IsaacLab#673

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] actor's std becomes "nan" during PPO training #33

[Bug Report] actor's std becomes "nan" during PPO training #33

mitsu3291 commented Jul 11, 2024 •

edited

Loading

Lruomeng commented Jul 12, 2024

shafeef901 commented Jul 15, 2024

felipemohr commented Jul 21, 2024

xliu0105 commented Jul 27, 2024

xliu0105 commented Jul 27, 2024

AlexanderAbernathy commented Jul 27, 2024

AlexanderAbernathy commented Jul 29, 2024

weifeng-lt commented Aug 15, 2024

[Bug Report] actor's std becomes "nan" during PPO training #33

[Bug Report] actor's std becomes "nan" during PPO training #33

Comments

mitsu3291 commented Jul 11, 2024 • edited Loading

Lruomeng commented Jul 12, 2024

shafeef901 commented Jul 15, 2024

felipemohr commented Jul 21, 2024

xliu0105 commented Jul 27, 2024

xliu0105 commented Jul 27, 2024

AlexanderAbernathy commented Jul 27, 2024

AlexanderAbernathy commented Jul 29, 2024

weifeng-lt commented Aug 15, 2024

mitsu3291 commented Jul 11, 2024 •

edited

Loading