Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cri-o stuck in "Could not restore" when node load is high when restart cri-o #8673

Open
lance5890 opened this issue Oct 15, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@lance5890
Copy link
Contributor

What happened?

in one node, when system load is high, and then we restart cri-o, the cri-o stuck in Restore process for a long time, the logs show as follows:

Oct 15 13:32:42 master-lharm-2 crio[3480]: time="2024-10-15 13:32:42.903497855+08:00" level=warning msg="Could not restore sandbox e456015ab35e79331beb0071d58ff312747bee64056d7abd4f746358a7401712: failed to Statfs \"/var/run/netns/f6300a46-4b30-4a3d-8c06-5d2bb5c67905\": no such file or directory"
Oct 15 13:32:43 master-lharm-2 crio[3480]: time="2024-10-15 13:32:43.339161797+08:00" level=warning msg="Deleting all containers under sandbox e456015ab35e79331beb0071d58ff312747bee64056d7abd4f746358a7401712 since it could not be restored"
Oct 15 13:33:15 master-lharm-2 crio[3480]: time="2024-10-15 13:33:15.733919215+08:00" level=warning msg="Could not restore sandbox e85249f52bd82fab8b187e5e6ff0e7f9f5e9244a12523baa971be8ba5d36df00: failed to Statfs \"/var/run/netns/202e8db9-b93d-4576-9e39-5b6589aef158\": no such file or directory"
Oct 15 13:33:16 master-lharm-2 crio[3480]: time="2024-10-15 13:33:16.327566397+08:00" level=warning msg="Deleting all containers under sandbox e85249f52bd82fab8b187e5e6ff0e7f9f5e9244a12523baa971be8ba5d36df00 since it could not be restored"
Oct 15 13:33:53 master-lharm-2 crio[3480]: time="2024-10-15 13:33:53.088736014+08:00" level=warning msg="Could not restore sandbox ea5dffa43cc58888f331c4542f7fa02fd87ce6e8722c018701f34adb3bbf2e4c: failed to Statfs \"/var/run/netns/8cee8b58-faca-4159-bae6-44119e7bfb7c\": no such file or directory"
Oct 15 13:33:53 master-lharm-2 crio[3480]: time="2024-10-15 13:33:53.471422710+08:00" level=warning msg="Deleting all containers under sandbox ea5dffa43cc58888f331c4542f7fa02fd87ce6e8722c018701f34adb3bbf2e4c since it could not be restored"
Oct 15 13:34:16 master-lharm-2 crio[3480]: time="2024-10-15 13:34:16.940943535+08:00" level=warning msg="Could not restore sandbox 92d39c21bd77d349068f1f6f8379267c40e77e4ebd981f5828c1ddbdf2662162: failed to Statfs \"/var/run/netns/fa2aacd2-986b-458c-96ef-2a4e231a00d2\": no such file or directory"
Oct 15 13:34:17 master-lharm-2 crio[3480]: time="2024-10-15 13:34:17.518848482+08:00" level=warning msg="Deleting all containers under sandbox 92d39c21bd77d349068f1f6f8379267c40e77e4ebd981f5828c1ddbdf2662162 since it could not be restored"
Oct 15 13:34:47 master-lharm-2 crio[3480]: time="2024-10-15 13:34:47.982638318+08:00" level=warning msg="Could not restore sandbox b23e853d33f32b930dc718be396ab2a632647979a76dabed6322a1c59fe2104d: failed to Statfs \"/var/run/netns/af5f5567-28a0-43fd-9072-32b7db1697d2\": no such file or directory"
Oct 15 13:34:48 master-lharm-2 crio[3480]: time="2024-10-15 13:34:48.174033605+08:00" level=warning msg="Deleting all containers under sandbox b23e853d33f32b930dc718be396ab2a632647979a76dabed6322a1c59fe2104d since it could not be restored"
Oct 15 13:35:14 master-lharm-2 crio[3480]: time="2024-10-15 13:35:14.167490731+08:00" level=warning msg="Could not restore sandbox a71024afae081939f8ddd2f386240de5fb1827bfab1c20319fbb72fdeeef398d: failed to Statfs \"/var/run/netns/787418dc-88a8-4b4e-a319-166232202cf6\": no such file or directory"
Oct 15 13:35:14 master-lharm-2 crio[3480]: time="2024-10-15 13:35:14.606607516+08:00" level=warning msg="Deleting all containers under sandbox a71024afae081939f8ddd2f386240de5fb1827bfab1c20319fbb72fdeeef398d since it could not be restored"
ls /var/lib/containers/storage/overlay-containers | wc -l
5785

What did you expect to happen?

even when the node has high system load, The cri-o could not stuck in the Restoring process for a long time

How can we reproduce it (as minimally and precisely as possible)?

in the high system load, create many pods

Anything else we need to know?

No response

CRI-O and Kubernetes version

$ crio --version
# paste output here

1.25.8

$ kubectl version --output=json
# paste output here

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
5.15.131-3
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

physical
@lance5890 lance5890 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant