Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes CI Policy: merge-blocking jobs must run in dedicated cluster #18550

Closed
21 of 22 tasks
spiffxp opened this issue Jul 30, 2020 · 13 comments · Fixed by #20416
Closed
21 of 22 tasks

Kubernetes CI Policy: merge-blocking jobs must run in dedicated cluster #18550

spiffxp opened this issue Jul 30, 2020 · 13 comments · Fixed by #20416
Labels
area/jobs area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@spiffxp
Copy link
Member

spiffxp commented Jul 30, 2020

Part of #18551

Why this is necessary:

  • we believe declaring Guaranteed Pod QOS jobs may not be defended against Best Effort or Burstable Pods hogging all resources on the same node
  • a cluster that only has Guaranteed pods is far more likely to respect resource requirements

Decisions to make:

  • For all of the jobs being migrated to k8s-infra-prow-build, we're going to use the same node pool as everything else. To pin to a dedicated node pool will require much more boilerplate (or possibly augmenting prow's preset feature). If after migrating everything we find we do need a dedicated nodepool, then we'll pay the added cost.
  • Jobs that can't be migrated quickly will remain in the default google.com-owned k8s-prow-builds cluster until such time as they can be migrated. They will continue to compete for resources until they have migrated, and we will be reliant on test-infra-oncall to provide any visibility into the behavior of such jobs.

Jobs to migrate:

TODO:

@spiffxp
Copy link
Member Author

spiffxp commented Aug 1, 2020

sig testing
/sig release
/wg k8s-infra
/area release-eng
/area jobs

@spiffxp
Copy link
Member Author

spiffxp commented Aug 13, 2020

I'm trialing migration of three different kinds of jobs to get a feel for whether the rest can be migrated over in a similar fashion, or whether more preparation is needed:

Based on how these go I'll break out other TODO's into help wanted issues

@helenfeng737
Copy link
Contributor

/cc

@spiffxp
Copy link
Member Author

spiffxp commented Aug 14, 2020

OK, I've broken out everything I think can be done right now, tagged as help-wanted, and added to the CI Policy Improvements - Next Priority column

The remaining TODO's need some unblocking work

@spiffxp
Copy link
Member Author

spiffxp commented Aug 21, 2020

We may need to roll some jobs back or adjust some limits. Started bumping into a quota we're having difficulty raising kubernetes/k8s.io#1132 (comment)

@spiffxp
Copy link
Member Author

spiffxp commented Aug 25, 2020

IP quota has been bumped to a reasonable level. I'm now less concerned about trying to mitigate or undo migration of the jobs that have been migrated thus far.

@spiffxp
Copy link
Member Author

spiffxp commented Aug 28, 2020

kubernetes/k8s.io#1187 trying to improve cluster's I/O capacity, I'm wary of moving over pull-kubernetes-bazel jobs until this is addressed

@spiffxp
Copy link
Member Author

spiffxp commented Sep 10, 2020

kubernetes/k8s.io#1231 we're starting to hit the max nodepool size of 90 so I'd like to increase to 150

@LappleApple
Copy link
Contributor

Heya: +1 if we can get this updated with the checkbox marked for #18854

@LappleApple
Copy link
Contributor

LappleApple commented Oct 12, 2020

In the write-up, under

TODO: determine gcp project / boskos requirements, provision

This item is now closed:
kubernetes/k8s.io#851 - scalability-project

Is that TODO now complete?

Also wondering if the other items in the TODO section:

TODO: migrate jobs that require special projects
TODO: declare we are finished (or done enough to move on)

need an update?

@spiffxp
Copy link
Member Author

spiffxp commented Nov 5, 2020

@LappleApple #19073 is the last remaining issue

@spiffxp
Copy link
Member Author

spiffxp commented Jan 8, 2021

Closed out the last remaining issue #19073 (comment)

All merge-blocking jobs run in k8s-infra-prow-build. Last thing to do is switch job config tests to fail instead of warn to prevent future violations of this policy.

@spiffxp
Copy link
Member Author

spiffxp commented Jan 8, 2021

Opened #20416 to switch job config tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/jobs area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants