KEP-2021: HPA supports scaling to/from zero pods for object/external metrics

Release Signoff Checklist
Summary
Motivation
- Goals
- Non-Goals
Proposal
Design Details
Production Readiness Review Questionnaire
Implementation History
Drawbacks
Alternatives
Infrastructure Needed (Optional)

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

(R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
(R) KEP approvers have approved the KEP status as implementable
(R) Design details are appropriately documented
(R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
(R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
(R) Production readiness review completed
(R) Production readiness review approved
"Implementation History" section is up-to-date for milestone
User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in any resource which supports the scale subresource based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics) from one to many replicas. This proposal adds support for scaling from zero to many replicas and back to zero for object and external metrics.

Motivation

With the addition of scaling based on object and external metrics it became possible to automatically adjust the number of running replicas based on an application provided metric. A typical use-case case for this is scaling the number of queue consumers based on the length of the consumed queue.

In cases of a frequently idle queue or a less latency sensitive workload there is no need to run one replica at all times and instead you want to dynamically scale to zero replicas, especially if those replicas have high resource requests. If replicas are scale to 0, HPA also needs the ability to scale up once messages are available.

Goals

Provide scaling to zero replicas for object and external metrics
Provide scaling from zero replicas for object and external metrics

Non-Goals

Provide scaling to/from zero replicas for resource metrics
Provide request buffering at the Kubernetes Service level

Proposal

Allow the HPA to scale from and to zero using minReplicas: 0 when explicitly enabled with a flag.

User Stories (Optional)

Story 1: Scale a heavy queue consumer on-demand

As the operator of a video processing pipeline, I would like to reduce costs. While video processing is CPU intensive, it is not a latency sensitive workload. Therefor I want my video processing workers to only be created if there is actually a video to be processed and terminated afterwards.

Notes/Constraints/Caveats (Optional)

Currently disabling HPA is possible by manually setting the scaled resource to replicas: 0. This works as the HPA itself could never reach this state itself. As replicas: 0 is now a possible state when using minReplicas: 0 it can no longer be used to differentiate between manually disabled or automatically scaled to zero.

Additionally the replicas: 0 state is problematic as updating a HPA object minReplicas from 0 to 1 has different behavior. If replicas was 0 during the update, HPA will be disabled for the resource, if it was > 0, HPA will continue with the new minReplicas value.

To resolve this issue the KEP is introducing an explicit enableScaleToZero property to explicitly enable/disable scale from/to zero.

Risks and Mitigations

From an UX perspective the two stage opt-out / opt-in from scale to zero might feel a bit tedious, but the only other available option seems to be deprecating the implicit HPA pause on replicas: 0. While this might provide an improved UX, it would require a full deprecation cycle (12 months) before graduating this feature from alpha to beta.

Design Details

We would add EnableScaleToZero *bool to the HPA spec.behavior.

type HorizontalPodAutoscalerBehavior struct {
    ScaleUp           *HPAScalingRules
    ScaleDown         *HPAScalingRules
    EnableScaleToZero *bool
}

type HorizontalPodAutoscalerSpec struct {
    ScaleTargetRef CrossVersionObjectReference
    MinReplicas    *int32
    MaxReplicas    int32
    Metrics        []MetricSpec
    Behavior       *HorizontalPodAutoscalerBehavior
}

The EnableScaleToZero controls whether the MinReplicas can set to >=0 (true, new behavior) or >=1 (false, current behavior). The default will be false to preserve the current behavior.

If EnableScaleToZero has been enabled, it can only be disabled when the scaled resource has at least one replica running and MinReplicas is >=1.

Test Plan

Most logic related to this KEP is contained in the HPA controller so the testing of the various minReplicas, replicas and enableScaleToZero should be achievable with unit tests.

Additionally integration tests should be added for enable scale to zero by, setting enableScaleToZero: true, setting minReplicas: 1 and waiting for replicas to become 0 and another test for increasing minReplicas: 1 and observing that replicas became 1 again and , setting enableScaleToZero: false afterwards.

Graduation Criteria

Alpha -> Beta Graduation

Implement the enableScaleToZero property
Ensure that all minReplicas state transitions from 0 to 1 are working as expected

Beta -> GA Graduation

Allowing time for feedback
E2E tests are passing without flakiness

Upgrade / Downgrade Strategy

As this KEP changes the allowed values for minReplicas, special care is required for the downgrade case to not prevent any kind of updates for HPA objects using minReplicas: 0. The alpha code already accepts minReplicas: 0 with the flag enabled or disabled since Kubernetes version 1.16 downgrades to any version >= 1.16 aren't an issue.

The new flag enableScaleToZero defaults to false, which was has been the previous behavior. In flag should be disabled before downgrading as otherwise the HPA for deployments with zero replicas will be disabled until replicas have been raised explicitly to at least 1.

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

Feature gate (also fill in values in kep.yaml)
- Feature gate name: HPAScaleToZero
- Components depending on the feature gate: kube-apiserver
Other
- Describe the mechanism:
  
  When HPAScaleToZero feature gate is enabled HPA supports scaling to zero pods based on object or external metrics. HPA remains active as long as at least one metric value available.
- Will enabling / disabling the feature require downtime of the control plane?
  
  No
- Will enabling / disabling the feature require downtime or reprovisioning of a node? (Do not assume Dynamic Kubelet Config feature is enabled).
  
  No

Does enabling the feature change any default behavior?

Any change of default behavior may be surprising to users or break existing automations, so be extremely careful here.

HPA creation/update with minReplicas: 0 is no longer rejected, if the enableScaleToZero field is set to true.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Also set disable-supported to true or false in kep.yaml. Describe the consequences on existing workloads (e.g., if this is a runtime feature, can it break the existing applications?).

Yes. To downgrade the cluster to version that does not support scale-to-zero feature:

Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:

$ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | sh
Disable HPAScaleToZero feature gate
In case step 1. has been omitted, workloads might be stuck with replicas: 0 and need to be manually scaled up to replicas: 1 to re-enable autoscaling.

What happens if we reenable the feature if it was previously rolled back?

Nothing, the feature can be re-enabled without problems.

Are there any tests for feature enablement/disablement?

There currently unit tests for the alpha cases and tests planned to be added for the new functionality.

Rollout, Upgrade and Rollback Planning

As this is a new field every usage is opt-in. In case the kubernetes version is downgraded, currently scaled to 0 workloads might not to be manually scaled to 1 as the controller would treat them as paused otherwise.

How can a rollout or rollback fail? Can it impact already running workloads?

There are no expected side-effects when the rollout fails as new enableScaleToZero flag should only be enabled once the version upgraded completed and should be disabled before attempting a rollback.

In case this is missed, HPA for deployments with zero replicas will be disabled until replicas have been raised explicitly to at least 1.

What specific metrics should inform a rollback?

If workloads aren't scaled up from 0 despite the scaling condition being meet, an operator should rollback this feature and manually scale an affected workload back to 1.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

No yet as no implementation is available.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Once this is in beta, the alpha flag can be removed.

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

The feature is used if workloads are scaled to zero by the autoscaling controller.

How can someone using this feature know that it is working for their instance?

Similar to autoscaling is confirmed today.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

No changes to the autoscaling SLOs.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

No changes to the autoscaling SLIs.

Are there any missing metrics that would be useful to have to improve observability of this feature?

No, in regards to this KEP.

Dependencies

Does this feature depend on any specific services running in the cluster?

The addition has the same dependencies as the currently autoscaling controller.

Scalability

Will enabling / using this feature result in any new API calls?

No, the amount of autoscaling related API calls will remain unchanged. No other components are affected.

Will enabling / using this feature result in introducing new API types?

No, this only modifies the existing API types.

Will enabling / using this feature result in any new calls to the cloud provider?

No, the amount of autoscaling related cloud provider calls will remain unchanged. No other components are affected.

Will enabling / using this feature result in increasing size or count of the existing API objects?

Yes, one additional boolean field.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No, the are no visible latency changes expected for existing autoscaling operations.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

No, the are no visible changes expected for existing autoscaling operations.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

What are other known failure modes?

What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

(2019/02/25) Original design doc: kubernetes/kubernetes#69687 (comment)
(2019/07/16) Alpha implementation (kubernetes/kubernetes#74526) merged for Kubernetes 1.16

Files

README.md

Latest commit

History