-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Make node ready only after static pods are registered #2078
base: master
Are you sure you want to change the base?
WIP: Make node ready only after static pods are registered #2078
Conversation
@haircommander: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: haircommander The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/payload-job |
@haircommander: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info. |
/payload 4.18 nightly blocking |
@haircommander: trigger 9 job(s) of type blocking for the nightly release of OCP 4.18
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5c23b740-6c76-11ef-884f-b3f32e2bdbf7-0 |
Node registration and pod syncs are done in separate Go routines, leading to a potential race condition. Static Pods might not get registered because the kubelet is not registered, causing scheduler to overcommit the node due to unawareness of static pod resource usage. This resulted in kubelet rejecting pods due to insufficient resources. The initial fix involved making the node schedulable only after static pod registration, but this introduced a 1-1.5 minute latency due to kubelet's resync interval for pods. To address this latency, we now resync static pods immediately upon node registration, ensuring the node becomes ready without additional delay.
381a29f
to
bd429bf
Compare
@haircommander: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
/payload 4.18 nightly blocking |
@haircommander: trigger 9 job(s) of type blocking for the nightly release of OCP 4.18
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8e7bc7a0-819d-11ef-9569-8f6afbeb2dae-0 |
/payload 4.18 nightly informing |
@kannon92: trigger 73 job(s) of type informing for the nightly release of OCP 4.18
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1bfa5e50-81c8-11ef-96d6-03534b196db9-0 |
/test e2e-aws-ovn-serial |
What type of PR is this?
/kind bug
What this PR does / why we need it:
for testing kubernetes#126870
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: