Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduling: Add new pod priority class #369

Merged
merged 1 commit into from
Jan 21, 2021

Conversation

lilic
Copy link
Contributor

@lilic lilic commented Jun 10, 2020

This proposal sums up the discussion we had on slack about introducing a new priority class. As outlined in the proposal the reason we would like to have this is to make sure cluster monitoring pods get scheduled in favour of user workload monitoring pods, with user workload monitoring maybe going GA in 4.6 it would be good to solve it in 4.6.

@lilic
Copy link
Contributor Author

lilic commented Jun 10, 2020

cc @bbrowning I mentioned knative here as per the discussion on slack, feel free to leave suggestions for any other use cases you would have or if it was not clear enough, thanks!

workloads, problem with that is that users can create priority classes that
would schedule this in favour of the OpenShift workloads.

[1]: https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note while 4.4 docs do show logging class (I linked to 3.11 as they were more true to what we have), this is not true out of the box on 4.6 at least, did not verify on earlier versions. https://docs.openshift.com/container-platform/4.4/nodes/pods/nodes-pods-priority.html#admin-guide-priority-preemption-priority-class_nodes-pods-priority

I am not sure if we should remove that after this is cleared up?

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits, looks great to me

enhancements/scheduling/priority-class.md Outdated Show resolved Hide resolved
enhancements/scheduling/priority-class.md Outdated Show resolved Hide resolved
@lilic
Copy link
Contributor Author

lilic commented Jun 10, 2020

cc @openshift/openshift-architects please take a look, thanks!


## Design Details

New priority class would be created by the component that creates the two
Copy link
Member

@derekwaynecarr derekwaynecarr Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this provides a nice default.

if users required further differentiation among user-critical workloads, we could explore an operator exposing an override for the default. this is only needed if demand was critical enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few additional questions:

  1. do we want to reserve a prefix like openshift- for priority class names that we control?
  2. do we want to restrict the set of namespaces this priority can be used in?

upstream we have support for quota by priority class:
https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-per-priorityclass

its possible we could restrict usage of a priority class without explicit quota in order to prevent consumption.
see:
https://kubernetes.io/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default

its worth enumerating some pros/cons on the above as part of this design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions!

  1. Adding a prefix of openshift- to the user-critical class name would make it clear to users that this is reserved for user workloads that are managed by OpenShift, but it goes against the current pattern as no one of our existing classes have a reserved prefix. To change this would require to change approx 170+ instances of the existing class names. It might make sense here as we would name it user-critical to prefix with openshift-user-critical, so I am happy to go that route just for this one.
  2. Seems like we do not limit the scope of namespaces for the current two existing classes, but this should be an easier fix to do and I think we should do this for all 3 (two existing and the new proposeed). The pros for this are that users don't use and abuse this reserved class. The cons are we might be breaking some things for our users that already consume this class? But also potentially some of our Red Hat components that are not installed in openshift-* namespaces, are we okay with doing that?

My vote would be to not change the existing classes, but apply the above to the new class, as we do not know what it might impact. Sounds good to you?

Will add to the document once we agree on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilic I am confused by your comment.

Kubernetes reserves the system-* prefix for Kubernetes usage (see: https://github.com/kubernetes/kubernetes/blob/5238e1c80fea02891e1804b346d24faa7c13da07/pkg/apis/scheduling/validation/validation.go#L37)

The question I was asking is if OpenShift should reserve a similar openshift-* prefix for names that are unique to the distribution. We obviously shouldn't change names that are reserved upstream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to reserve the openshift-* name prefix for use by OpenShift.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with Derek out of band, we think the best place would be openshift-apiserver or openshift-controller-manager.

was cluster-config-operator explicilty rejected in that conversation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not OpenShift-API related. So neither oa nor ocm make sense. I also tend to cluster-config-operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply! The problem was not all components install cluster-config-operator, so would be a problem if a core component like monitoring uses this role but it's not there in some environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilic if you get stuck on this i suggest putting it in the CMO itself for now and re-homing it when someone else wants to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees Sounds great, will do, thanks!

introducing a third priority class: `user-critical`. This would be used by any
pods that are important for user facing OpenShift features but are not deemed
system critical. Example of pods include user workload monitoring and user's
Knative Service.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace "user's Knative Service" here with "the OpenShift Serverless control and data planes". We don't want OpenShift Serverless control or data planes to be system-*-critical because it's an optional component and not critical to the core functioning of the cluster itself. But, we do want the OpenShift Serverless control and data plane pods to be a higher priority than user workloads because if OpenShift Serverless control or data plane pods get evicted then that degrades the functionality of all Knative workloads in the cluster.

@bbrowning
Copy link

As a general comment, I'd expect many optional operators will want to take advantage of this new priority. I don't think eviction is something widely tested today, but a number of optional operators (Service Mesh, Pipelines, CodeReady Workspaces, and so on) would end up in a bad place if certain operator pods got evicted before user workloads consuming features of those optional operators.

@lilic
Copy link
Contributor Author

lilic commented Jun 18, 2020

@bbrowning Agreed, I do believe this is something that is missing, will add this to the proposal notes.

@s-urbaniak
Copy link
Contributor

ping @openshift/openshift-architects can we have an lgtm or are there open questions still?

@lilic lilic changed the title scheduling: Add new pod pod priority class scheduling: Add new pod priority class Oct 13, 2020
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2021
@lilic lilic force-pushed the priority-classes branch 3 times, most recently from 3b54fb9 to bcd388d Compare January 21, 2021 14:20
@lilic
Copy link
Contributor Author

lilic commented Jan 21, 2021

@bparees thanks for ping on slack, I updated to reflect this should be ready for review and merge :) 🎉

@bparees
Copy link
Contributor

bparees commented Jan 21, 2021

/approve
/lgtm

thanks for your persistence/patience on this @lilic :)

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2021
@openshift-merge-robot openshift-merge-robot merged commit 1dd0262 into openshift:master Jan 21, 2021
@lilic lilic deleted the priority-classes branch January 21, 2021 16:12
@lilic
Copy link
Contributor Author

lilic commented Jan 21, 2021

Thank you Ben! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants