Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-scale vtgate with HPA #598

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,7 @@ vtorc-vtadmin-test: build e2e-test-setup
unmanaged-tablet-test: build e2e-test-setup
echo "Running Unmanaged Tablet test"
test/endtoend/unmanaged_tablet_test.sh

hpa-test: build e2e-test-setup
echo "Running HPA test"
test/endtoend/hpa_test.sh
338 changes: 338 additions & 0 deletions deploy/crds/planetscale.com_vitesscells.yaml

Large diffs are not rendered by default.

328 changes: 328 additions & 0 deletions deploy/crds/planetscale.com_vitessclusters.yaml

Large diffs are not rendered by default.

8 changes: 7 additions & 1 deletion deploy/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,10 @@ rules:
resources:
- jobs
verbs:
- '*'
- '*'
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- '*'
111 changes: 111 additions & 0 deletions docs/api/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,78 @@ <h3 id="planetscale.com/v2.VitessCluster">VitessCluster
</tr>
</tbody>
</table>
<h3 id="planetscale.com/v2.AutoscalerSpec">AutoscalerSpec
</h3>
<p>
(<em>Appears on:</em>
<a href="#planetscale.com/v2.VitessCellGatewaySpec">VitessCellGatewaySpec</a>)
</p>
<p>
<p>AutoscalerSpec defines the vtgate&rsquo;s pod autoscaling specification.</p>
</p>
<table class="table table-striped">
<thead class="thead-dark">
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>minReplicas</code></br>
<em>
int32
</em>
</td>
<td>
<p>MinReplicas is the minimum number of instances of vtgate to run in
this cell when autoscaling is enabled.</p>
</td>
</tr>
<tr>
<td>
<code>maxReplicas</code></br>
<em>
int32
</em>
</td>
<td>
<p>MaxReplicas is the maximum number of instances of vtgate to run in
this cell when autoscaling is enabled.</p>
</td>
</tr>
<tr>
<td>
<code>behavior</code></br>
<em>
<a href="https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#horizontalpodautoscalerbehavior-v2-autoscaling">
Kubernetes autoscaling/v2.HorizontalPodAutoscalerBehavior
</a>
</em>
</td>
<td>
<em>(Optional)</em>
</td>
</tr>
<tr>
<td>
<code>metrics</code></br>
<em>
<a href="https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#metricspec-v2-autoscaling">
[]Kubernetes autoscaling/v2.MetricSpec
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>Metrics is meant to provide a customizable way to configure HPA metrics.
currently the only supported custom metrics is type=Pod.
Use TargetCPUUtilization or TargetMemoryUtilization instead if scaling on these common resource metrics.</p>
</td>
</tr>
</tbody>
</table>
<h3 id="planetscale.com/v2.AzblobBackupLocation">AzblobBackupLocation
</h3>
<p>
Expand Down Expand Up @@ -3337,6 +3409,21 @@ <h3 id="planetscale.com/v2.VitessCellGatewaySpec">VitessCellGatewaySpec
</tr>
<tr>
<td>
<code>autoscaler</code></br>
<em>
<a href="#planetscale.com/v2.AutoscalerSpec">
AutoscalerSpec
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>Autoscaler specifies the pod autoscaling configuration to use
for the vtgate workload.</p>
</td>
</tr>
<tr>
<td>
<code>resources</code></br>
<em>
<a href="https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core">
Expand Down Expand Up @@ -3614,6 +3701,30 @@ <h3 id="planetscale.com/v2.VitessCellGatewayStatus">VitessCellGatewayStatus
<p>ServiceName is the name of the Service for this cell&rsquo;s vtgate.</p>
</td>
</tr>
<tr>
<td>
<code>labelSelector</code></br>
<em>
string
</em>
</td>
<td>
<p>LabelSelector is required by the Scale subresource, which is used by
HorizontalPodAutoscaler when reading pod metrics.</p>
</td>
</tr>
<tr>
<td>
<code>replicas</code></br>
<em>
int32
</em>
</td>
<td>
<p>Replicas is required by the Scale subresource, which is used by
HorizontalPodAutoscaler to determine the current number of replicas.</p>
</td>
</tr>
</tbody>
</table>
<h3 id="planetscale.com/v2.VitessCellImages">VitessCellImages
Expand Down
36 changes: 36 additions & 0 deletions pkg/apis/planetscale/v2/vitesscell_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ limitations under the License.
package v2

import (
autoscalingv2 "k8s.io/api/autoscaling/v2"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
Expand All @@ -39,6 +40,7 @@ import (
// just like a Deployment can manage Pods that run on multiple Nodes.
// +kubebuilder:resource:path=vitesscells,shortName=vtc
// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=.spec.gateway.replicas,statuspath=.status.gateway.replicas,selectorpath=.status.gateway.labelSelector
type VitessCell struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Expand Down Expand Up @@ -117,12 +119,39 @@ type VitessCellImages struct {
Vtgate string `json:"vtgate,omitempty"`
}

// AutoscalerSpec defines the vtgate's pod autoscaling specification.
type AutoscalerSpec struct {
// MinReplicas is the minimum number of instances of vtgate to run in
// this cell when autoscaling is enabled.
// +kubebuilder:validation:Minimum=0
MinReplicas *int32 `json:"minReplicas,omitempty"`

// MaxReplicas is the maximum number of instances of vtgate to run in
// this cell when autoscaling is enabled.
// +kubebuilder:validation:Minimum=0
MaxReplicas *int32 `json:"maxReplicas,omitempty"`

// +optional
Behavior *autoscalingv2.HorizontalPodAutoscalerBehavior `json:"behavior,omitempty"`

// Metrics is meant to provide a customizable way to configure HPA metrics.
// currently the only supported custom metrics is type=Pod.
// Use TargetCPUUtilization or TargetMemoryUtilization instead if scaling on these common resource metrics.
// +optional
Metrics []autoscalingv2.MetricSpec `json:"metrics,omitempty"`
}

// VitessCellGatewaySpec specifies the per-cell deployment parameters for vtgate.
type VitessCellGatewaySpec struct {
// Replicas is the number of vtgate instances to deploy in this cell.
// +kubebuilder:validation:Minimum=0
Replicas *int32 `json:"replicas,omitempty"`

// Autoscaler specifies the pod autoscaling configuration to use
// for the vtgate workload.
// +optional
Autoscaler *AutoscalerSpec `json:"autoscaler,omitempty"`

// Resources determines the compute resources reserved for each vtgate replica.
Resources corev1.ResourceRequirements `json:"resources,omitempty"`

Expand Down Expand Up @@ -252,6 +281,13 @@ type VitessCellGatewayStatus struct {
Available corev1.ConditionStatus `json:"available,omitempty"`
// ServiceName is the name of the Service for this cell's vtgate.
ServiceName string `json:"serviceName,omitempty"`
// LabelSelector is required by the Scale subresource, which is used by
// HorizontalPodAutoscaler when reading pod metrics.
LabelSelector string `json:"labelSelector,omitempty"`
// Replicas is required by the Scale subresource, which is used by
// HorizontalPodAutoscaler to determine the current number of replicas.
// +kubebuilder:validation:Minimum=0
Replicas int32 `json:"replicas,omitempty"`
}

// VitessCellStatus defines the observed state of VitessCell
Expand Down
43 changes: 43 additions & 0 deletions pkg/apis/planetscale/v2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 35 additions & 0 deletions pkg/controller/vitesscell/reconcile_vtgate.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"context"

appsv1 "k8s.io/api/apps/v1"
autoscalingv2 "k8s.io/api/autoscaling/v2"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
apitypes "k8s.io/apimachinery/pkg/types"
Expand Down Expand Up @@ -161,6 +162,10 @@ func (r *ReconcileVitessCell) reconcileVtgate(ctx context.Context, vtc *planetsc
curObj := obj.(*appsv1.Deployment)

status := &vtc.Status.Gateway
if replicas := curObj.Spec.Replicas; replicas != nil {
status.Replicas = *replicas
}
status.LabelSelector = curObj.Spec.Selector.String()
if available := conditions.Deployment(curObj.Status.Conditions, appsv1.DeploymentAvailable); available != nil {
status.Available = available.Status
}
Expand All @@ -170,5 +175,35 @@ func (r *ReconcileVitessCell) reconcileVtgate(ctx context.Context, vtc *planetsc
resultBuilder.Error(err)
}

var wantHpa bool
var hpaSpec *vtgate.HpaSpec

if vtc.Spec.Gateway.Autoscaler != nil {
wantHpa = vtc.Spec.Gateway.Autoscaler.MaxReplicas != nil
hpaSpec = &vtgate.HpaSpec{
Labels: labels,
MinReplicas: vtc.Spec.Gateway.Autoscaler.MinReplicas,
MaxReplicas: vtc.Spec.Gateway.Autoscaler.MaxReplicas,
Behavior: vtc.Spec.Gateway.Autoscaler.Behavior,
Metrics: vtc.Spec.Gateway.Autoscaler.Metrics,
}
}

// Reconcile vtgate HorizontalPodAutoscaler.
err = r.reconciler.ReconcileObject(ctx, vtc, key, labels, wantHpa, reconciler.Strategy{
Kind: &autoscalingv2.HorizontalPodAutoscaler{},

New: func(key client.ObjectKey) runtime.Object {
return vtgate.NewHorizontalPodAutoscaler(key, hpaSpec)
},
UpdateInPlace: func(key client.ObjectKey, obj runtime.Object) {
newObj := obj.(*autoscalingv2.HorizontalPodAutoscaler)
vtgate.UpdateHorizontalPodAutoscaler(newObj, hpaSpec)
},
})
if err != nil {
resultBuilder.Error(err)
}
Comment on lines +193 to +206
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HorizontalPodAutoscaler resource is not something that the operator or even the ReconcileVitessCell object know about. It leads to the error I attached below and that can be found in vitess-operator's logs.

 pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v2.HorizontalPodAutoscaler: failed to list *v2.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:default:vitess-operator" cannot list resource "horizontalpodautoscalers" in API group "autoscaling" in the namespace "default"

We must give the ReconcileVitessCell object the proper information to start watching this resource: this can be done by adding *v2.HorizontalPodAutoscaler to the watchResources slice in the vitesscell pkg.

Moreover, the K8S roles needs to be updated to give the proper permission for K8S to communicate with the API, this can be done by adding the following snippet to the ./deploy/role.yaml file.

- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - '*'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, creating a cluster does not fail as the vitess-operator can still run. But no HPA is created:

$ kubectl get hpa
No resources found in default namespace.

With this change however, it looks good, vitess-operator is healthy and we can see the HPA:

$ kubectl get hpa
NAME                            REFERENCE                                  TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
example-zone1-vtgate-bc6cde92   VitessCell/example-zone1-vtgate-bc6cde92   <unknown>/10%   1         10        0          33s

Note that the target is unknown because my metrics-server was disabled at the time.


return resultBuilder.Result()
}
2 changes: 2 additions & 0 deletions pkg/controller/vitesscell/vitesscell_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ import (
"sigs.k8s.io/controller-runtime/pkg/reconcile"
"sigs.k8s.io/controller-runtime/pkg/source"

autoscalingv2 "k8s.io/api/autoscaling/v2"
planetscalev2 "planetscale.dev/vitess-operator/pkg/apis/planetscale/v2"
"planetscale.dev/vitess-operator/pkg/operator/environment"
"planetscale.dev/vitess-operator/pkg/operator/metrics"
Expand All @@ -60,6 +61,7 @@ var log = logrus.WithField("controller", "VitessCell")
var watchResources = []client.Object{
&corev1.Service{},
&appsv1.Deployment{},
&autoscalingv2.HorizontalPodAutoscaler{},

&planetscalev2.EtcdLockserver{},
}
Expand Down
9 changes: 6 additions & 3 deletions pkg/controller/vitesscluster/reconcile_cells.go
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,12 @@ func updateVitessCellInPlace(key client.ObjectKey, vtc *planetscalev2.VitessCell
// Update labels, but ignore existing ones we don't set.
update.Labels(&vtc.Labels, newCell.Labels)

// We allow immediate update of replica counts for stateless workloads,
// like Deployment does.
vtc.Spec.Gateway.Replicas = newCell.Spec.Gateway.Replicas
// Only update replicas if autoscaling is disabled.
if vtc.Spec.Gateway.Autoscaler != nil && vtc.Spec.Gateway.Autoscaler.MaxReplicas != nil {
// We allow immediate update of replica counts for stateless workloads,
// like Deployment does.
vtc.Spec.Gateway.Replicas = newCell.Spec.Gateway.Replicas
}
}

func updateVitessCell(key client.ObjectKey, vtc *planetscalev2.VitessCell, vt *planetscalev2.VitessCluster, parentLabels map[string]string, cell *planetscalev2.VitessCellTemplate) {
Expand Down
Loading
Loading