-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make critical jobs Guaranteed Pod QOS: ci-kubernetes-build #18577
Comments
/assign |
/remove-help |
/close I think it's safe to close this one. cc. @RobertKielty |
@ZhiFeng1993: You can't close an active issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Looking at testgrids https://testgrid.k8s.io/sig-release-master-blocking#build-master&graph-metrics=test-duration-minutes&width=20 - looks mostly ok https://testgrid.k8s.io/sig-release-1.18-blocking#build-1.18&graph-metrics=test-duration-minutes&width=20 - this concerns me The two peaks on the right are what I think is "good" behavior... the job fails due to timeout for Reasons™, but there is a followup run that doesn't, and the build is available for use. The two peaks on the left don't have any runs after that are OK. Did incomplete builds get published? More importantly... is this new behavior? Or has this been happening prior to us setting resource constraints? |
The resource limit changes merged 2020-08-03 4:20pm PDT https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-build-stable1/1289866689743163392 - is an example of a build falling prey to this problem before the change was deployed. So I don't think this is new behavior |
I opened #18808 to address the bad build job behavior Given that it's pre-existing bad behavior, I'm willing to call this done /close |
Thanks @ZhiFeng1993 ! |
@spiffxp: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What should be cleaned up or changed:
This is part of #18530
The following jobs should be Guaranteed Pod QOS, meaning they should have CPU and memory resource limits, and matching resource requests:
These jobs run on (google.com only) k8s-prow-build, so @spiffxp has provided the following guess:
General steps to follow:
resources:
field with matching entries)@kubernetes/ci-signal
in the description/sig testing
/sig release
/area jobs
/area release-eng
The text was updated successfully, but these errors were encountered: