scheduling: Add new pod priority class #369

lilic · 2020-06-10T13:29:19Z

This proposal sums up the discussion we had on slack about introducing a new priority class. As outlined in the proposal the reason we would like to have this is to make sure cluster monitoring pods get scheduled in favour of user workload monitoring pods, with user workload monitoring maybe going GA in 4.6 it would be good to solve it in 4.6.

lilic · 2020-06-10T13:35:00Z

cc @bbrowning I mentioned knative here as per the discussion on slack, feel free to leave suggestions for any other use cases you would have or if it was not clear enough, thanks!

lilic · 2020-06-10T13:34:09Z

enhancements/scheduling/priority-class.md

+workloads, problem with that is that users can create priority classes that
+would schedule this in favour of the OpenShift workloads.
+
+[1]: https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class


Note while 4.4 docs do show logging class (I linked to 3.11 as they were more true to what we have), this is not true out of the box on 4.6 at least, did not verify on earlier versions. https://docs.openshift.com/container-platform/4.4/nodes/pods/nodes-pods-priority.html#admin-guide-priority-preemption-priority-class_nodes-pods-priority

I am not sure if we should remove that after this is cleared up?

bparees

nits, looks great to me

enhancements/scheduling/priority-class.md

lilic · 2020-06-10T16:02:48Z

cc @openshift/openshift-architects please take a look, thanks!

derekwaynecarr · 2020-06-10T18:01:49Z

enhancements/scheduling/priority-class.md

+
+## Design Details
+
+New priority class would be created by the component that creates the two


i think this provides a nice default.

if users required further differentiation among user-critical workloads, we could explore an operator exposing an override for the default. this is only needed if demand was critical enough.

a few additional questions:

do we want to reserve a prefix like openshift- for priority class names that we control?

do we want to restrict the set of namespaces this priority can be used in?

upstream we have support for quota by priority class:
https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-per-priorityclass

its possible we could restrict usage of a priority class without explicit quota in order to prevent consumption.
see:
https://kubernetes.io/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default

its worth enumerating some pros/cons on the above as part of this design.

Thanks for the suggestions!

Adding a prefix of openshift- to the user-critical class name would make it clear to users that this is reserved for user workloads that are managed by OpenShift, but it goes against the current pattern as no one of our existing classes have a reserved prefix. To change this would require to change approx 170+ instances of the existing class names. It might make sense here as we would name it user-critical to prefix with openshift-user-critical, so I am happy to go that route just for this one.

Seems like we do not limit the scope of namespaces for the current two existing classes, but this should be an easier fix to do and I think we should do this for all 3 (two existing and the new proposeed). The pros for this are that users don't use and abuse this reserved class. The cons are we might be breaking some things for our users that already consume this class? But also potentially some of our Red Hat components that are not installed in openshift-* namespaces, are we okay with doing that?

My vote would be to not change the existing classes, but apply the above to the new class, as we do not know what it might impact. Sounds good to you?

Will add to the document once we agree on this.

@lilic I am confused by your comment.

Kubernetes reserves the system-* prefix for Kubernetes usage (see: https://github.com/kubernetes/kubernetes/blob/5238e1c80fea02891e1804b346d24faa7c13da07/pkg/apis/scheduling/validation/validation.go#L37)

The question I was asking is if OpenShift should reserve a similar openshift-* prefix for names that are unique to the distribution. We obviously shouldn't change names that are reserved upstream.

I am inclined to reserve the openshift-* name prefix for use by OpenShift.

As discussed with Derek out of band, we think the best place would be openshift-apiserver or openshift-controller-manager.

was cluster-config-operator explicilty rejected in that conversation?

These are not OpenShift-API related. So neither oa nor ocm make sense. I also tend to cluster-config-operator.

Thanks for the reply! The problem was not all components install cluster-config-operator, so would be a problem if a core component like monitoring uses this role but it's not there in some environments.

@lilic if you get stuck on this i suggest putting it in the CMO itself for now and re-homing it when someone else wants to use it.

@bparees Sounds great, will do, thanks!

bbrowning · 2020-06-17T15:17:17Z

enhancements/scheduling/priority-class.md

+introducing a third priority class: `user-critical`. This would be used by any
+pods that are important for user facing OpenShift features but are not deemed
+system critical. Example of pods include user workload monitoring and user's
+Knative Service. 


I'd replace "user's Knative Service" here with "the OpenShift Serverless control and data planes". We don't want OpenShift Serverless control or data planes to be system-*-critical because it's an optional component and not critical to the core functioning of the cluster itself. But, we do want the OpenShift Serverless control and data plane pods to be a higher priority than user workloads because if OpenShift Serverless control or data plane pods get evicted then that degrades the functionality of all Knative workloads in the cluster.

bbrowning · 2020-06-17T15:21:16Z

As a general comment, I'd expect many optional operators will want to take advantage of this new priority. I don't think eviction is something widely tested today, but a number of optional operators (Service Mesh, Pipelines, CodeReady Workspaces, and so on) would end up in a bad place if certain operator pods got evicted before user workloads consuming features of those optional operators.

lilic · 2020-06-18T10:40:00Z

@bbrowning Agreed, I do believe this is something that is missing, will add this to the proposal notes.

s-urbaniak · 2020-08-24T11:33:19Z

ping @openshift/openshift-architects can we have an lgtm or are there open questions still?

openshift-bot · 2021-01-20T12:40:47Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

lilic · 2021-01-21T14:56:17Z

@bparees thanks for ping on slack, I updated to reflect this should be ready for review and merge :) 🎉

bparees · 2021-01-21T15:32:33Z

/approve
/lgtm

thanks for your persistence/patience on this @lilic :)

openshift-ci-robot · 2021-01-21T15:33:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [bparees]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lilic · 2021-01-21T16:12:10Z

Thank you Ben! 🎉

openshift-ci-robot requested review from abhinavdahiya and enxebre June 10, 2020 13:29

lilic commented Jun 10, 2020

View reviewed changes

bparees reviewed Jun 10, 2020

View reviewed changes

enhancements/scheduling/priority-class.md Outdated Show resolved Hide resolved

enhancements/scheduling/priority-class.md Outdated Show resolved Hide resolved

derekwaynecarr reviewed Jun 10, 2020

View reviewed changes

bbrowning reviewed Jun 17, 2020

View reviewed changes

lilic changed the title ~~scheduling: Add new pod pod priority class~~ scheduling: Add new pod priority class Oct 13, 2020

lilic mentioned this pull request Nov 23, 2020

Create pod priority class for user workload monitoring openshift/cluster-monitoring-operator#987

Merged

2 tasks

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2021

lilic force-pushed the priority-classes branch 3 times, most recently from 3b54fb9 to bcd388d Compare January 21, 2021 14:20

scheduling: Add new pod pod priority class

9bf51e5

lilic force-pushed the priority-classes branch from bcd388d to 9bf51e5 Compare January 21, 2021 14:22

openshift-ci-robot assigned bparees Jan 21, 2021

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2021

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2021

openshift-merge-robot merged commit 1dd0262 into openshift:master Jan 21, 2021

lilic deleted the priority-classes branch January 21, 2021 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduling: Add new pod priority class #369

scheduling: Add new pod priority class #369

lilic commented Jun 10, 2020

lilic commented Jun 10, 2020

lilic Jun 10, 2020

bparees left a comment

lilic commented Jun 10, 2020

derekwaynecarr Jun 10, 2020 •

edited

Loading

derekwaynecarr Jun 10, 2020

lilic Jun 15, 2020

derekwaynecarr Aug 26, 2020

derekwaynecarr Aug 26, 2020

bparees Oct 14, 2020

sttts Oct 20, 2020

lilic Oct 20, 2020

bparees Oct 20, 2020

lilic Oct 22, 2020

bbrowning Jun 17, 2020

bbrowning commented Jun 17, 2020

lilic commented Jun 18, 2020

s-urbaniak commented Aug 24, 2020

openshift-bot commented Jan 20, 2021

lilic commented Jan 21, 2021

bparees commented Jan 21, 2021

openshift-ci-robot commented Jan 21, 2021

lilic commented Jan 21, 2021


		## Design Details

		New priority class would be created by the component that creates the two

scheduling: Add new pod priority class #369

scheduling: Add new pod priority class #369

Conversation

lilic commented Jun 10, 2020

lilic commented Jun 10, 2020

Choose a reason for hiding this comment

bparees left a comment

Choose a reason for hiding this comment

lilic commented Jun 10, 2020

derekwaynecarr Jun 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbrowning commented Jun 17, 2020

lilic commented Jun 18, 2020

s-urbaniak commented Aug 24, 2020

openshift-bot commented Jan 20, 2021

lilic commented Jan 21, 2021

bparees commented Jan 21, 2021

openshift-ci-robot commented Jan 21, 2021

lilic commented Jan 21, 2021

derekwaynecarr Jun 10, 2020 •

edited

Loading