worker nodes are getting into unresponsive state ( not responding to any systemctl command and reboot). #820
Unanswered
rvijayanand
asked this question in
Q&A
Replies: 1 comment
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Worker nodes are getting into unresponsive state and not responding to any systemctl command and reboot.
New Pod are struck in "ContainerCreating" State and server goes un responsive for any systemctl command and when tries reboot. got below error
[root@worker~]# reboot
Failed to open initctl fifo: No such device or address
Failed to talk to init daemon.
[root@worker~]#
systemctl version
systemd 248 (v248.5-1.fc34)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
[core@worker~]$ sudo systemctl list-units
Failed to list units: Connection timed out
OKD4 Version : 4.7.0-0.okd-2021-08-07-063045
Installation : installed on Bare metal (UPI)
How reproducible
This issue happens randomly in some of the worker node , even though the nodes are not loaded heavility.
Log bundle
unable to run must-gather
oc adm must-gather --dest-dir=/tmp/must-gather
[must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/okd-content@sha256:87af45b2ed9f3eb8cf01319cf8b68fc700c3c1270ac58b1ec5c94f2527dcc941
[must-gather ] OUT namespace/openshift-must-gather-wjbgf created
[must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-ldw5d created
[must-gather ] OUT pod for plug-in image quay.io/openshift/okd-content@sha256:87af45b2ed9f3eb8cf01319cf8b68fc700c3c1270ac58b1ec5c94f2527dcc941 created
[must-gather-rw8s6] OUT gather did not start: timed out waiting for the condition
[must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-ldw5d deleted
[must-gather ] OUT namespace/openshift-must-gather-wjbgf deleted
error: gather did not start for pod must-gather-rw8s6: timed out waiting for the condition
Beta Was this translation helpful? Give feedback.
All reactions