-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make critical jobs Guaranteed Pod QOS for ci-kubernetes-build-* #18620
Make critical jobs Guaranteed Pod QOS for ci-kubernetes-build-* #18620
Conversation
Hi @ZhiFeng1993. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @kubernetes/ci-signal |
@@ -48,9 +48,12 @@ periodics: | |||
securityContext: | |||
privileged: true | |||
resources: | |||
limits: | |||
cpu: 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will likely cause pod scheduling failure every time as there is overhead for a few other processes on a node, suggest something like 7300m
(ref: #18420)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the values are different for this cluster, because the build jobs are still in google.com prow, and cannot easily be moved (due to the release bucket)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenTheElder ah gotcha, I wonder if we could add some of the basic stats of that cluster to the umbrella issue
cc @spiffxp |
Daniel: the cluster config is public for the k8s-infra cluster (it's in
k8s.io), the trick is figuring out how much CPU is available on a node.
Currently it's something like 100m less than the google.com cluster,
because network policy is enabled and that addon runs a 100m cpu request
pod on each node (calico).
…On Mon, Aug 3, 2020 at 11:36 AM Daniel Mangum ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In config/jobs/kubernetes/sig-release/kubernetes-builds.yaml
<#18620 (comment)>
:
> @@ -48,9 +48,12 @@ periodics:
securityContext:
privileged: true
resources:
+ limits:
+ cpu: 8
@BenTheElder <https://github.com/BenTheElder> ah gotcha, I wonder if we
could add some of the basic stats of that cluster to the umbrella issue
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18620 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHADKY4IYIZPPMA7UFRW3LR637R5ANCNFSM4PTSAUSQ>
.
|
/ok-to-test I'm fine merging this and finding out whether we've asked for too much the hard way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
Keep an eye out for whether these jobs start to fail. If they're failing because of Pod scheduling timeout, they may be asking for too much CPU
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: spiffxp, ZhiFeng1993 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
FYI @kubernetes/ci-signal @kubernetes/release-engineering |
@ZhiFeng1993: Updated the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Add cpu and resource usage as suggested in: #18577
ref: #18577
EDIT by spiffxp to change fixes: to ref, don't want to auto-close the issue this corresponds to until we've confirmed the job continues to run healthily