-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automated resource request scaling #628
automated resource request scaling #628
Conversation
@dhellmann: GitHub didn't allow me to request PR reviews from the following users: browsell. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
005b965
to
290b755
Compare
/priority important-soon |
e9cca10
to
684e6c5
Compare
/retest |
684e6c5
to
e61d7e7
Compare
e61d7e7
to
151c4a0
Compare
I have updated the enhancement with more details in the proposal section. Please take another look. |
151c4a0
to
2fdf015
Compare
3. Have a new operator in the release payload perform the scaling for | ||
control plane operators | ||
|
||
It would use the new API field to determine how to scale (see above). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have some operators for it - https://github.com/openshift/cluster-resource-override-admission-operator, we can improve it and add new API fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so we can modify this section to say it doesn't need to be in the release payload but that it would need to be installed separately and link to the existing operator.
We'll also need to say that each openshift operator will need to make the appropriate adjustments to ignore changes to the resource requests to avoid contention with the cluster-resource-override-admission web hook. It looks like the way to enable the admission web hook is to add a label to the namespace and then update the code to ignore resource requests on operands when comparing for equality. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cynepco3hahue The latest draft elevates the cluster-resource-override-admission-operator to the primary approach being proposed. Let me know what you think.
Signed-off-by: Doug Hellmann <[email protected]>
df0a664
to
251add0
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The latest draft of this enhancement reflects the changes to the approach that we came up with today, using https://github.com/openshift/cluster-resource-override-admission. There are still some details to work out, but I think the basics are all here now. All of the old approaches have been folded into the alternatives section at the end of the document. @derekwaynecarr @browsell @markmc please take a look and let me know if I've missed anything. |
I forgot to mention that I left the changes in a separate patch in case that made reviewing easier. I will squash the commits before we merge this enhancement. |
56a59a3
to
c969eba
Compare
Signed-off-by: Doug Hellmann <[email protected]>
c969eba
to
ceeca77
Compare
a tight timeline for delivering the work described in this | ||
enhancement, that may help with scheduling. | ||
|
||
1. Add a new field to `ClusterResourceOverride` API with a name like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a general case where a pod has a limit set, would need to decide the precendence if both cpuRequestPercent and cpuRequestToLimitPercent are set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It seems like the limit-based scaling (the existing behavior) should take precedence, and the new field should be ignored if the old field is present. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
/cc @rphillips @sjenning |
This enhancement doesn't appear to mention the runlevel restrictions created in the clusterresourceoverride's MutatingWebhookConfiguration which allows us to ensure re-bootstrapping a cluster that has been shut down. We label namespaces in openshift with This includes 36 out of 61 openshift-* namespaces on our 4.7 CI cluster. Removing this restriction to shrink the pod resource requests/limits would be fraught. One example flow that will work in a demo and be at risk in a production cluster is
We created the runlevels and skips to ensure that we did not wedge in such a situation. While it may be possible to tune the labeling slightly tighter for a single-node use case (I don't know one way or the other), having an accident at this level can result in situations where clusters cannot be safely restarted. Knowing that more than half of the namespaces we have today will not be restricted and knowing that we need a way to scale resource requests and limits on pods in these 36 namespaces and on static pods, having a single solution for the additional 25 namespaces doesn't seem so bad. |
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
This enhancement presents an alternative approach to replace openshift#628. Signed-off-by: Doug Hellmann <[email protected]>
Closing this in favor of #703 |
This enhancement describes an approach to allow us to scale the
resource requests for the control plane services to reduce consumption
for constrained environments. This will be especially useful for
single-node production deployments, where the user wants to reserve
most of the CPU resources for their own workloads and needs to
configure OpenShift to run on a fixed number of CPUs within the host.
/cc @smarterclayton @derekwaynecarr @markmc @browsell