Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only scale up zone after all leader zone replicas are ready #164

Merged
merged 7 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## main / unreleased

* [FEATURE] Only scale-up once all of the leader `StatefulSet` replicas are ready. #164
jhesketh marked this conversation as resolved.
Show resolved Hide resolved

## v0.17.1

* [ENHANCEMENT] prepare-downscale admission webhook: undo prepare-shutdown calls if adding the `last-downscale` annotation fails. #151
Expand Down
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,11 @@ For each **rollout group**, the operator **guarantees**:

The operator can also optionally coordinate scaling up and down of `StatefulSets` that are part of the same `rollout-group` based on the `grafana.com/rollout-downscale-leader` annotation. When using this feature, the `grafana.com/min-time-between-zones-downscale` label must also be set on each `StatefulSet`.

This can be useful for automating the tedious scaling of stateful services like Mimir ingesters. Making use of this feature requires adding a few annotations and labels to configure how it works. Examples for a multi-AZ ingester group are given below.
This can be useful for automating the tedious scaling of stateful services like Mimir ingesters. Making use of this feature requires adding a few annotations and labels to configure how it works.

If the `grafana.com/rollout-leader-ready` annotation is set to `true` on a follower `StatefulSet`, the operator will only scale up the follower once all replicas in the leader `StatefulSet` are `ready`. This ensures that the follower zone does not scale up until the leader zone is completely stable.

Example usage for a multi-AZ ingester group:

- For `ingester-zone-a`, add the following:
- Labels:
Expand All @@ -47,7 +51,8 @@ This can be useful for automating the tedious scaling of stateful services like
- `grafana.com/min-time-between-zones-downscale=12h` (change the value here to an appropriate duration)
- `grafana.com/prepare-downscale=true` (to allow the service to be notified when it will be scaled down)
- Annotations:
- `grafana.com/rollout-downscale-leader=ingester-zone-a` (zone `b` will follow zone `a`, after a delay)
- `grafana.com/rollout-downscale-leader=ingester-zone-a` (zone `b` will follow zone `a`, after a delay)
- `grafana.com/rollout-leader-ready=true` (zone `b` will only scale up once all replicas in zone `a` are ready)
- `grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown` (to call a specific endpoint on the service)
- `grafana.com/prepare-downscale-http-port=80` (to call a specific endpoint on the service)
- For `ingester-zone-c`, add the following:
Expand All @@ -56,6 +61,7 @@ This can be useful for automating the tedious scaling of stateful services like
- `grafana.com/prepare-downscale=true` (to allow the service to be notified when it will be scaled down)
- Annotations:
- `grafana.com/rollout-downscale-leader=ingester-zone-b` (zone `c` will follow zone `b`, after a delay)
- `grafana.com/rollout-leader-ready=true` (zone `c` will only scale up once all replicas in zone `b` are ready)
- `grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown` (to call a specific endpoint on the service)
- `grafana.com/prepare-downscale-http-port=80` (to call a specific endpoint on the service)

Expand Down
3 changes: 3 additions & 0 deletions pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ const (
// RolloutDownscaleLeaderAnnotationKey is the name of the leader statefulset that should be used to determine
// the number of replicas in a follower statefulset.
RolloutDownscaleLeaderAnnotationKey = "grafana.com/rollout-downscale-leader"
// RolloutLeaderReadyKey is whether to only scale up once `ready` replicas match the desired replicas.
RolloutLeaderReadyAnnotationKey = "grafana.com/rollout-leader-ready"
jhesketh marked this conversation as resolved.
Show resolved Hide resolved
RolloutLeaderReadyAnnotationValue = "true"

rolloutMirrorReplicasFromResourceAnnotationKeyPrefix = "grafana.com/rollout-mirror-replicas-from-resource"
// RolloutMirrorReplicasFromResourceNameAnnotationKey -- when set (together with "kind" and optionally "api-version" annotations), rollout-operator sets number of
Expand Down
10 changes: 9 additions & 1 deletion pkg/controller/replicas.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,15 @@ func desiredStsReplicas(group string, sts *v1.StatefulSet, all []*v1.StatefulSet

leaderReplicas := *leader.Spec.Replicas
if leaderReplicas > followerReplicas {
// Scale up is always allowed immediately
// Handle scale-up scenarios
annotations := sts.GetAnnotations()
onlyWhenReady, ok := annotations[config.RolloutLeaderReadyAnnotationKey]
if ok && onlyWhenReady == config.RolloutLeaderReadyAnnotationValue {
// We only scale up once all of the leader pods are ready. Otherwise we do nothing.
if leaderReplicas != leader.Status.ReadyReplicas {
return followerReplicas, nil
jhesketh marked this conversation as resolved.
Show resolved Hide resolved
}
}
return leaderReplicas, nil
}

Expand Down
71 changes: 71 additions & 0 deletions pkg/controller/replicas_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,77 @@ func TestReconcileStsReplicas(t *testing.T) {
require.Equal(t, int32(4), replicas)
})

t.Run("scale up only when all leader replicas are ready", func(t *testing.T) {
sts1 := mockStatefulSet("test-zone-a", withReplicas(4, 3))
sts2 := mockStatefulSet("test-zone-b", withReplicas(2, 2), withAnnotations(map[string]string{
config.RolloutDownscaleLeaderAnnotationKey: "test-zone-a",
config.RolloutLeaderReadyAnnotationKey: config.RolloutLeaderReadyAnnotationValue,
}))
sts3 := mockStatefulSet("test-zone-c", withReplicas(1, 1), withAnnotations(map[string]string{
config.RolloutDownscaleLeaderAnnotationKey: "test-zone-b",
config.RolloutLeaderReadyAnnotationKey: config.RolloutLeaderReadyAnnotationValue,
}))

replicas, err := desiredStsReplicas("test", sts2, []*v1.StatefulSet{sts1, sts2, sts3}, log.NewNopLogger())
require.NoError(t, err)
require.Equal(t, int32(2), replicas, "no change in replicas because leader isn't ready yet")

sts1 = mockStatefulSet("test-zone-a", withReplicas(4, 4))
replicasB, err := desiredStsReplicas("test", sts2, []*v1.StatefulSet{sts1, sts2, sts3}, log.NewNopLogger())
require.NoError(t, err)
require.Equal(t, int32(4), replicasB, "ready to scale zone-b")

sts2 = mockStatefulSet("test-zone-b", withReplicas(replicasB, 2))
replicasC, err := desiredStsReplicas("test", sts3, []*v1.StatefulSet{sts1, sts2, sts3}, log.NewNopLogger())
require.NoError(t, err)
require.Equal(t, int32(1), replicasC, "no change in replicas because zone-b isn't ready yet")

sts2 = mockStatefulSet("test-zone-b", withReplicas(replicasB, replicasB))
replicasC, err = desiredStsReplicas("test", sts3, []*v1.StatefulSet{sts1, sts2, sts3}, log.NewNopLogger())
require.NoError(t, err)
require.Equal(t, int32(4), replicasC, "ready to scale zone-c")
})

t.Run("scale up ignoring ready replicas", func(t *testing.T) {
sts1 := mockStatefulSet("test-zone-a", withReplicas(4, 2))
sts2 := mockStatefulSet("test-zone-b", withReplicas(3, 3), withAnnotations(map[string]string{
config.RolloutDownscaleLeaderAnnotationKey: "test-zone-a",
}))

replicas, err := desiredStsReplicas("test", sts2, []*v1.StatefulSet{sts1, sts2}, log.NewNopLogger())
require.NoError(t, err)
require.Equal(t, int32(4), replicas)
})

t.Run("scaling down always occurs to desired replicas", func(t *testing.T) {
sts1 := mockStatefulSet("test-zone-a", withReplicas(2, 1))
sts2 := mockStatefulSet("test-zone-b", withReplicas(3, 3), withAnnotations(map[string]string{
config.RolloutDownscaleLeaderAnnotationKey: "test-zone-a",
config.RolloutLeaderReadyAnnotationKey: config.RolloutLeaderReadyAnnotationValue,
}))

replicas, err := desiredStsReplicas("test", sts2, []*v1.StatefulSet{sts1, sts2}, log.NewNopLogger())
require.NoError(t, err)
require.Equal(t, int32(2), replicas)
})

t.Run("do not scale up follower if not all leader replicas are ready", func(t *testing.T) {
sts1 := mockStatefulSet("test-zone-a", withReplicas(6, 4))
sts2 := mockStatefulSet("test-zone-b", withReplicas(3, 3), withAnnotations(map[string]string{
config.RolloutDownscaleLeaderAnnotationKey: "test-zone-a",
config.RolloutLeaderReadyAnnotationKey: config.RolloutLeaderReadyAnnotationValue,
}))

replicas, err := desiredStsReplicas("test", sts2, []*v1.StatefulSet{sts1, sts2}, log.NewNopLogger())
require.NoError(t, err)
require.Equal(t, int32(3), replicas, "no change in replicas because leader isn't ready yet")

sts1 = mockStatefulSet("test-zone-a", withReplicas(6, 6))
replicas, err = desiredStsReplicas("test", sts2, []*v1.StatefulSet{sts1, sts2}, log.NewNopLogger())
require.NoError(t, err)
require.Equal(t, int32(6), replicas, "ready to scale zone-b")
})
jhesketh marked this conversation as resolved.
Show resolved Hide resolved

t.Run("scale down min time error", func(t *testing.T) {
downscale1 := time.Now().UTC().Round(time.Second).Add(-72 * time.Hour)
downscale2 := time.Now().UTC().Round(time.Second).Add(-60 * time.Hour)
Expand Down