-
Notifications
You must be signed in to change notification settings - Fork 39.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delete pods
API call latencies shot up on large cluster tests
#51899
Comments
I'm trying to find the offending PR(s). The diff is too huge across those runs. |
My strong feeling is that it may be related to pagination. If I'm right, #51876 should hopefully fix the problem. |
But this is mostly a guess and it may also be something different. |
@kubernetes/test-infra-maintainers We don't have any logs (except the build-log from jenkins) for run-26 of
Such failures (that too right at the end of the run) can hit us really hard wrt debugging. |
@wojtek-t We're also seeing increase in the patch latencies. So there probably is another regression too? |
Till run 8132, kubemark-500 seemed fine:
We had some startup failures in b/w and it seems to shot up starting from run 8141:
Looking at the diff right now. |
Anyway - @shyamjvs we need more data about what other metrics has changed (cpu usage, allocations, number of API calls, ...) |
After eliminating trivial and unrelated PRs, the following are left:
|
Can you grab CPU profile from the master when running test? It should help you with figuring out the cause. |
Adding release team: @jdumars @calebamiles @spiffxp |
There are some consecutive failed runs (due to testinfra bugs / regressions) leading to missing points or temporary spikes... Neglect the noise |
I ran the tests locally on kubemark-500 against head (commit ffed1d3) and it went through fine with low latency values like before. Re-running again to see if it's flaky. I'll continue digging into it tomorrow. |
Unless you've turned on all alpha features on kubemark you shouldn't have
seen an increase with pagination (which is sounds like further
investigation eliminated). If you did turn on alpha features on kubemark
that's exactly the validation I was looking for with chunking (that no
regression occurred), which is good. Chunking would increase average
latency but should decrease tail latency in most cases (naive chunking
would increase error rates).
…On Mon, Sep 4, 2017 at 5:23 PM, Shyam JVS ***@***.***> wrote:
I ran the tests locally on kubemark-500 against head (commit ffed1d3
<ffed1d3>)
and it went through fine with low latency values like before. Re-running
again to see if it's flaky. I'll continue digging into it tomorrow.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51899 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p_ODR8S6ycmF1ppTOEZ8Bub3aVsaks5sfGpSgaJpZM4PL3gt>
.
|
Ok... So the reason for kubemark failing is different. As @wojtek-t pointed out, the kubemark master is not rightly sized. |
Btw - that also explains why the failure wasn't reproducing locally for me. Because I bypassed all the testinfra logic by manually starting the clusters, it got rightly calculated. |
@smarterclayton Thanks for the explanation. I checked that neither of kubemark nor our real cluster tests are using pagination (as |
cc @dashpole @vishh @dchen1107 |
I can post a PR to revert the key portion of #50350. |
@dashpole thank you! |
I guess we should also wait until the fix is tested too (add ~2 more hours for it). |
PR posted: #53210 |
@dashpole and I had an offline discussion, and plan to revert portion of #50350. The original pr was introduced because without that change, Kubelet disk eviction manager might delete more containers unnecessarily. We should improve disk management with more intelligence to take proactive action, instead of relying on the periodical gc. |
While this is in the 1.9 milestone, there is every expectation that this will be resolved at the earliest responsible moment in a 1.8 patch release. The full decision-making process around this can be viewed at https://youtu.be/r6D5DNel2l8 |
Any updates on this? |
We already have a victim PR - the PR that is fixing that is #53233 |
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add audit-logging, feature-gates & few admission plugins to kubemark To make kubemark match real cluster settings. Also includes a few other settings like request-timeout, etcd-quorum, etc. Fixes #53021 Related #51899 #44701 cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @smarterclayton
Automatic merge from submit-queue (batch tested with PRs 53403, 53233). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Remove containers from deleted pods once containers have exited Issue #51899 Since container deletion is currently done through periodic garbage collection every 30 seconds, it takes a long time for pods to be deleted, and causes the kubelet to send all delete pod requests at the same time, which has performance issues. This PR makes the kubelet actively remove containers of deleted pods rather than wait for them to be removed in periodic garbage collection. /release-note-none
…3-upstream-release-1.8 Automatic merge from submit-queue. Automated cherry pick of #53233 Fixes #51899 Cherry pick of #53233 on release-1.8. #53233: remove containers of deleted pods once all containers have ```release-note Fixes a performance issue (#51899) identified in large-scale clusters when deleting thousands of pods simultaneously across hundreds of nodes, by actively removing containers of deleted pods, rather than waiting for periodic garbage collection and batching resulting pod API deletion requests. ```
[MILESTONENOTIFIER] Milestone Issue Needs Approval @dashpole @shyamjvs @kubernetes/sig-api-machinery-bugs @kubernetes/sig-node-bugs @kubernetes/sig-scalability-bugs Action required: This issue must have the Issue Labels
|
/close |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Modify traces in deletion handler Ref kubernetes/kubernetes#51899 (comment) cc @kubernetes/sig-release-members @jdumars @dims Can we get this into 1.8? Kubernetes-commit: 5952e932e9a1b593876c1504df5d1cb3fd72299d
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Modify traces in deletion handler Ref kubernetes/kubernetes#51899 (comment) cc @kubernetes/sig-release-members @jdumars @dims Can we get this into 1.8? Kubernetes-commit: 5952e932e9a1b593876c1504df5d1cb3fd72299d
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Modify traces in deletion handler Ref kubernetes/kubernetes#51899 (comment) cc @kubernetes/sig-release-members @jdumars @dims Can we get this into 1.8? Kubernetes-commit: 5952e932e9a1b593876c1504df5d1cb3fd72299d
Updated issue description with latest findings:
#50350 introduced a change to kubelet pod deletion that results in
delete pod
API calls from kubelets being concentrated immediately after container garbage collection.When performing a deletion of large numbers (thousands) of pods across large numbers (hundreds) of nodes, the resulting concentrated delete calls from the kubelets cause increased latency of
delete pods
API calls above the target threshold:Seen on https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/
Not seen on https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/ci-kubernetes-kubemark-gce-scale/
hoisted details from #51899 (comment) into description:
Graphs over a three minute window (both latency in ms and distribution of delete calls per second):
the delete calls from the gc controller merely set deletionTimestamp on the pod (which the kubelets observe and start shutting down the pods). the delete calls from the kubelet actually delete from etcd.
there were multiple spikes throughout the run (corresponding to deletions of 3000-pod replication controllers). here's a graph of delete calls that took more than one second throughout the whole run:
Seems like a kubelet-induced thundering herd, when deleting massive numbers of pods across massive numbers of nodes. kubemark's gc is stubbed since there are no real containers, so this isn't observed there
Original issue description follows:
From 5k-node density test run (no. 26) on last friday:
And from the last healthy run (no. 23) of the test:
This is a huge increase we're seeing:
LIST pods : 2.6s -> 8.1s
LIST nodes: 1s -> 2.3s
PATCH node-status: 56ms -> 1s
...
cc @kubernetes/sig-api-machinery-bugs @kubernetes/sig-scalability-misc @smarterclayton @wojtek-t @gmarek
The text was updated successfully, but these errors were encountered: