Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create pod priority class for user workload monitoring #987

Merged
merged 4 commits into from
Nov 24, 2020

Conversation

lilic
Copy link
Contributor

@lilic lilic commented Nov 23, 2020

This introduces a new PriorityClass named openshift-user-critical, and applies that policy to all user workload components. We want this because we have seen user workload pods get scheduled in favor of platform monitoring pods. We always want platform monitoring to have higher priority scheduling as it monitors platform critical workloads, user workload monitoring stack is now second priority class before all rest of user workloads.

See background on this openshift/enhancements#369.

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 23, 2020
@lilic
Copy link
Contributor Author

lilic commented Nov 23, 2020

cc @openshift/openshift-team-monitoring PTAL, I tested this on a cluster and the class is applied, not sure we need a test for it? If we were to test any disruption to the workloads, that would be testing the priority class k8s feature. Any other test suggestions, welcome!

@s-urbaniak
Copy link
Contributor

My intuition is also that testing if the priority is applied is out of scope here (i.e. disruption).

What we could test is if the actual priority value of system-cluster-critical is higher than openshift-user-critical. That could be done as an e2e tests which reads out the values in a concrete cluster.

@lilic
Copy link
Contributor Author

lilic commented Nov 23, 2020

@s-urbaniak Sounds good, this test would be there to determine the new priority class is lower priority than the system one, correct?

test/e2e/main_test.go Outdated Show resolved Hide resolved
@lilic
Copy link
Contributor Author

lilic commented Nov 23, 2020

level=error msg=Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthServerConfigObservation_Error::OAuthServiceCheckEndpointAccessibleController_SyncError::OAuthServiceEndpointsCheckEndpointAccessibleController_SyncError::RouterCerts_NoRouterCertSecret: OAuthServiceCheckEndpointAccessibleControllerDegraded: Get "https://172.30.152.80:443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

unrelated it seems
/retest

@lilic
Copy link
Contributor Author

lilic commented Nov 23, 2020

/retest

test/e2e/main_test.go Outdated Show resolved Hide resolved
@s-urbaniak
Copy link
Contributor

@lilic the test looks spot on (at least to me) 👍 I have just one logical question.

@lilic
Copy link
Contributor Author

lilic commented Nov 24, 2020

@s-urbaniak PTAL, adjusted as per feedback, thanks!

@simonpasquier
Copy link
Contributor

/lgtm

should we get an official ack from OpenShift architects?

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lilic, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [lilic,simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 24, 2020
@openshift-merge-robot openshift-merge-robot merged commit 1509a34 into openshift:master Nov 24, 2020
@lilic lilic deleted the pod-policy branch November 24, 2020 12:19
@lilic
Copy link
Contributor Author

lilic commented Nov 24, 2020

@simonpasquier thanks!

should we get an official ack from OpenShift architects?

We got an 👍 from Derek already, before that on forum-arch folks said yes to it, on both the name and this approach and it can be used by other user workloads in the future. Just location was never agreed upon, so for now we create it, and if other components want to consume it we can move it. SGTY?

@simonpasquier
Copy link
Contributor

@lilic this is fine by me then! I wasn't sure reading the enhancement proposal and comments that everything was already agreed :)

@lilic
Copy link
Contributor Author

lilic commented Nov 25, 2020

@simonpasquier Yep was lots of comments :) openshift/enhancements#369 (comment) and openshift/enhancements#369 (comment) are the main comments, everything else is specific to the priority class being applied to other components or its location. There was also a lot of out of band discussions on slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants