Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The liveness probe will fail when the machine’s memory usage is high #10138

Closed
x-coder-L opened this issue May 23, 2024 · 1 comment
Closed

Comments

@x-coder-L
Copy link

x-coder-L commented May 23, 2024

Environmental Info:
K3s Version:

k3s version v1.29.2+k3s1 (86f1021)
go version go1.21.7
Node(s) CPU architecture, OS, and Version:

Linux 5.17.15-1.el8.x86_64 #1 SMP PREEMPT Wed Jun 15 02:07:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:

1 servers
Describe the bug:

When the machine’s memory usage exceeds 85% (despite there still being sufficient memory for k3s to allocate), the pod may fail the liveness probe, displaying the error message ‘Get “http:xxx”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)’. However, when using curl to test the service simultaneously, it can provide the correct response.
Steps To Reproduce:

  • Installed K3s:
    k3s.io/node-args": "["server","--kubelet-arg","kube-reserved=memory=2Gi","--kubelet-arg","system-reserved=memory=32Gi","--kubelet-arg","sync-frequency=1s","--kube-apiserver-arg","event-ttl=48h0m0s","--flannel-backend","none","--node-name","localhost","--disable-helm-controller"]
    reproduce:
    When a pod with a qosclass type of BestEffort consumes a large amount of memory, causing the machine’s memory usage to exceed 85% but not triggering the k3s eviction conditions or reaching the k3s oom limit, we observe the situation of the pod using the liveness probe. An error message ‘Get “http:xxx”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)’ will appear.
    Expected behavior:

The error message ‘Get “http:xxx”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)’ won't appear when the machine’s memory usage to exceed 85% but not triggering the k3s eviction conditions or reaching the k3s oom limit
Additional context / logs:

3h25m       Warning   Unhealthy                pod/metrics-server-67c658944b-rt25v                              Readiness probe failed: Get "https://10.42.0.20:10250/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
3h24m       Warning   Unhealthy                pod/metrics-server-67c658944b-rt25v                              Liveness probe failed: Get "https://10.42.0.20:10250/livez": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
3h17m       Warning   Unhealthy                pod/metrics-server-67c658944b-rt25v                              Liveness probe failed: Get "https://10.42.0.20:10250/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
3h17m       Warning   Unhealthy                pod/metrics-server-67c658944b-rt25v                              Readiness probe failed: Get "https://10.42.0.20:10250/readyz": context deadline exceeded
3h17m       Warning   Unhealthy                pod/metrics-server-67c658944b-rt25v                              Readiness probe failed: Get "https://10.42.0.20:10250/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
3h17m       Warning   Unhealthy                pod/metrics-server-67c658944b-rt25v                              Readiness probe failed: Get "https://10.42.0.20:10250/readyz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
3h17m       Warning   Unhealthy                pod/coredns-5f4f9b8989-gxk68                                     Liveness probe failed: Get "http://10.42.0.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
3h17m       Warning   Unhealthy                pod/coredns-5f4f9b8989-gxk68                                     Readiness probe failed: Get "http://10.42.0.2:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
@brandond
Copy link
Contributor

brandond commented May 23, 2024

I don't see that this is an issue with k3s itself, or something that we can fix in this project. I'm not sure what we're supposed to do if the node lacks sufficient resource such that the workload becomes unresponsive, or the kubelet is unable to complete the request in a timely manner due to resource contention with other processes.

Do you have swap enabled on your node? Is it perhaps thrashing on swap, making it look like there's more memory available than you actually have?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants