- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in any resource which supports the scale
subresource based on observed CPU utilization
(or, with custom metrics support, on some other application-provided metrics) from one to many replicas. This proposal adds support for scaling from zero to many replicas and back to zero for object and external metrics.
With the addition of scaling based on object and external metrics it became possible to automatically adjust the number of running replicas based on an application provided metric. A typical use-case case for this is scaling the number of queue consumers based on the length of the consumed queue.
In cases of a frequently idle queue or a less latency sensitive workload there is no need to run one replica at all times and instead you want to dynamically scale to zero replicas, especially if those replicas have high resource requests. If replicas are scale to 0, HPA also needs the ability to scale up once messages are available.
- Provide scaling to zero replicas for object and external metrics
- Provide scaling from zero replicas for object and external metrics
- Provide scaling to/from zero replicas for resource metrics
- Provide request buffering at the Kubernetes Service level
Allow the HPA to scale from and to zero using minReplicas: 0
when explicitly enabled with a flag.
As the operator of a video processing pipeline, I would like to reduce costs. While video processing is CPU intensive, it is not a latency sensitive workload. Therefor I want my video processing workers to only be created if there is actually a video to be processed and terminated afterwards.
Currently disabling HPA is possible by manually setting the scaled resource to replicas: 0
. This works as the HPA itself could never reach this state itself.
As replicas: 0
is now a possible state when using minReplicas: 0
it can no longer be used to differentiate between manually disabled or automatically scaled to zero.
Additionally the replicas: 0
state is problematic as updating a HPA object minReplicas
from 0
to 1
has different behavior. If replicas
was 0
during the update, HPA
will be disabled for the resource, if it was > 0
, HPA will continue with the new minReplicas
value.
To resolve this issue the KEP is introducing an explicit enableScaleToZero
property to explicitly enable/disable scale from/to zero.
From an UX perspective the two stage opt-out / opt-in from scale to zero might feel a bit tedious, but the only other available option seems to be deprecating the implicit HPA pause on replicas: 0
. While this might provide an improved
UX, it would require a full deprecation cycle (12 months) before graduating this feature from alpha to beta.
We would add EnableScaleToZero *bool
to the HPA spec.behavior
.
type HorizontalPodAutoscalerBehavior struct {
ScaleUp *HPAScalingRules
ScaleDown *HPAScalingRules
EnableScaleToZero *bool
}
type HorizontalPodAutoscalerSpec struct {
ScaleTargetRef CrossVersionObjectReference
MinReplicas *int32
MaxReplicas int32
Metrics []MetricSpec
Behavior *HorizontalPodAutoscalerBehavior
}
The EnableScaleToZero
controls whether the MinReplicas
can set to >=0
(true
, new behavior) or >=1
(false
, current behavior). The default will be false
to preserve the current behavior.
If EnableScaleToZero
has been enabled, it can only be disabled when the scaled resource has at least one replica
running and MinReplicas
is >=1
.
Most logic related to this KEP is contained in the HPA controller so the testing of
the various minReplicas
, replicas
and enableScaleToZero
should be achievable with unit tests.
Additionally integration tests should be added for enable scale to zero by, setting
enableScaleToZero: true
, setting minReplicas: 1
and waiting for replicas
to become 0
and another test for increasing minReplicas: 1
and observing that replicas
became 1
again and , setting enableScaleToZero: false
afterwards.
- Implement the
enableScaleToZero
property - Ensure that all
minReplicas
state transitions from0
to1
are working as expected
- Allowing time for feedback
- E2E tests are passing without flakiness
As this KEP changes the allowed values for minReplicas
, special care is required for the downgrade case to not prevent any kind of updates for HPA objects using minReplicas: 0
. The alpha code already accepts minReplicas: 0
with the flag enabled or disabled since Kubernetes version 1.16 downgrades to any version >= 1.16 aren't an issue.
The new flag enableScaleToZero
defaults to false
, which was has been the previous behavior. In flag should be disabled before downgrading as otherwise the
HPA for deployments with zero replicas will be disabled until replicas have been
raised explicitly to at least 1
.
- Feature gate (also fill in values in
kep.yaml
)- Feature gate name:
HPAScaleToZero
- Components depending on the feature gate:
kube-apiserver
- Feature gate name:
- Other
-
Describe the mechanism:
When HPAScaleToZero feature gate is enabled HPA supports scaling to zero pods based on object or external metrics. HPA remains active as long as at least one metric value available.
-
Will enabling / disabling the feature require downtime of the control plane?
No
-
Will enabling / disabling the feature require downtime or reprovisioning of a node? (Do not assume
Dynamic Kubelet Config
feature is enabled).No
-
Any change of default behavior may be surprising to users or break existing automations, so be extremely careful here.
HPA creation/update with minReplicas: 0
is no longer rejected, if the enableScaleToZero
field is set to true.
Also set disable-supported
to true
or false
in kep.yaml
.
Describe the consequences on existing workloads (e.g., if this is a runtime
feature, can it break the existing applications?).
Yes. To downgrade the cluster to version that does not support scale-to-zero feature:
-
Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:
$ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | sh
-
Disable
HPAScaleToZero
feature gate -
In case step 1. has been omitted, workloads might be stuck with
replicas: 0
and need to be manually scaled up toreplicas: 1
to re-enable autoscaling.
Nothing, the feature can be re-enabled without problems.
There currently unit tests for the alpha cases and tests planned to be added for the new functionality.
As this is a new field every usage is opt-in. In case the kubernetes version is downgraded, currently scaled to 0 workloads might not to be manually scaled to 1 as the controller would treat them as paused otherwise.
There are no expected side-effects when the rollout fails as new enableScaleToZero
flag should only be enabled once the version upgraded completed and should be disabled before attempting a rollback.
In case this is missed, HPA for deployments with zero replicas will be disabled until replicas have been raised explicitly to at least 1
.
If workloads aren't scaled up from 0 despite the scaling condition being meet, an operator should rollback this feature and manually scale an affected workload back to 1
.
No yet as no implementation is available.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
Once this is in beta, the alpha flag can be removed.
The feature is used if workloads are scaled to zero by the autoscaling controller.
Similar to autoscaling is confirmed today.
No changes to the autoscaling SLOs.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
No changes to the autoscaling SLIs.
Are there any missing metrics that would be useful to have to improve observability of this feature?
No, in regards to this KEP.
The addition has the same dependencies as the currently autoscaling controller.
No, the amount of autoscaling related API calls will remain unchanged. No other components are affected.
No, this only modifies the existing API types.
No, the amount of autoscaling related cloud provider calls will remain unchanged. No other components are affected.
Yes, one additional boolean field.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No, the are no visible latency changes expected for existing autoscaling operations.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
No, the are no visible changes expected for existing autoscaling operations.
- (2019/02/25) Original design doc: kubernetes/kubernetes#69687 (comment)
- (2019/07/16) Alpha implementation (kubernetes/kubernetes#74526) merged for Kubernetes 1.16