Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge remote-tracking branch 'upstream/main' into merge-upstream-v0.20.0 #7

Merged
merged 27 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
ce7b413
CHANGELOG for license change (#142)
johannaratliff Apr 4, 2024
afc5577
Update dependencies (#144)
andyasp May 2, 2024
ea17193
Cut v0.15.0 (#145)
andyasp May 8, 2024
33c4fcf
Invoke DELETE on pod prepare-downscale path if any POSTs failed (#146)
seizethedave May 15, 2024
1a50079
Adjust changelog for v0.16.0 (#147)
seizethedave May 22, 2024
54b71c1
Swap base image from alpine to distroless (#149)
andyasp May 30, 2024
3b0040a
Add request UID to webhook logs. (#150)
seizethedave Jun 4, 2024
0b18d84
Check for non-updated replicas during down-scale in zoneTracker (#141)
JordanRushing Jun 5, 2024
b36d4bc
Include username in 'handling request' log. (#152)
seizethedave Jun 6, 2024
fdfe18c
Support percentages in rollout-max-unavailable annotation (#153)
pstibrany Jun 11, 2024
6b427ca
Fix unbalanced pairs in log. (#154)
seizethedave Jun 13, 2024
5dce3cc
Allow delayed downscale of subset of pods (#156)
pstibrany Jun 17, 2024
f5bef38
Update changelog for v0.17.0. (#157)
pstibrany Jun 17, 2024
dddb21d
Admission webhook: Undo prepare-shutdown calls if last-downscale fail…
seizethedave Jun 21, 2024
53c59f6
Prep changelog for 0.17.1. (#158)
seizethedave Jun 25, 2024
fa52527
Only scale up zone after all leader zone replicas are ready (#164)
jhesketh Aug 13, 2024
b21cc68
Prepare 0.18 release (#166)
jhesketh Aug 13, 2024
eaf0138
Update release doc (#167)
jhesketh Aug 14, 2024
5be56c9
Update dependencies (#165)
andyasp Aug 15, 2024
2608ac1
Update Go to 1.23 (#168)
andyasp Aug 15, 2024
f40e769
Make patching of reference resource optional (#169)
pstibrany Aug 28, 2024
a37d1cf
Prepare v0.19.0. (#170)
pstibrany Aug 28, 2024
3a78df9
Shorten `grafana.com/rollout-mirror-replicas-from-resource-write-back…
pstibrany Aug 28, 2024
ef21e37
Release v0.19.1. (#172)
pstibrany Aug 28, 2024
75e10d6
Update dependencies (#174)
andyasp Sep 30, 2024
1cb25f6
Cut v0.20.0 (#175)
andyasp Oct 7, 2024
7d10753
Merge remote-tracking branch 'upstream/main' into merge-upstream-v0.20.0
yuchen-db Oct 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
14 changes: 7 additions & 7 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22'
go-version: '1.23'
check-latest: true
- run: make rollout-operator

Expand All @@ -20,7 +20,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22'
go-version: '1.23'
check-latest: true
- run: make test
- run: make test-boringcrypto
Expand All @@ -31,7 +31,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22'
go-version: '1.23'
check-latest: true
- run: make build-image
- run: make integration
Expand All @@ -42,7 +42,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22'
go-version: '1.23'
check-latest: true
- run: make build-image-boringcrypto
- run: make integration
Expand All @@ -53,9 +53,9 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22'
go-version: '1.23'
check-latest: true
- uses: golangci/golangci-lint-action@v4
- uses: golangci/golangci-lint-action@v6
with:
version: v1.56
version: v1.60.1
args: --timeout=5m
59 changes: 59 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,65 @@

## main / unreleased

## v0.20.0

* [ENHANCEMENT] Updated dependencies, including: #174
* `github.com/prometheus/client_golang` from `v1.19.1` to `v1.20.4`
* `github.com/prometheus/common` from `v0.55.0` to `v0.59.1`
* `k8s.io/api` from `v0.30.3` to `v0.31.1`
* `k8s.io/apimachinery` from `v0.30.3` to `v0.31.1`
* `k8s.io/client-go` from `v0.30.3` to `v0.31.1`
* `sigs.k8s.io/controller-runtime` from `v0.18.5` to `v0.19.0`

## v0.19.1

* [CHANGE] Renamed `grafana.com/rollout-mirror-replicas-from-resource-write-back-status-replicas` annotation to `grafana.com/rollout-mirror-replicas-from-resource-write-back`, because it was too long (over 64 chars). #171

## v0.19.0

* [ENHANCEMENT] Updated dependencies, including: #165
* `github.com/prometheus/client_golang` from `v1.19.0` to `v1.19.1`
* `github.com/prometheus/common` from `v0.53.0` to `v0.55.0`
* `golang.org/x/sync` from `v0.7.0` to `v0.8.0`
* `k8s.io/api` from `v0.30.0` to `v0.30.3`
* `k8s.io/apimachinery` from `v0.30.0` to `v0.30.3`
* `k8s.io/client-go` from `v0.30.0` to `v0.30.3`
* `sigs.k8s.io/controller-runtime` from `v0.18.1` to `v0.18.5`
* [ENHANCEMENT] Update Go to `1.23`. #168
* [ENHANCEMENT] When mirroring replicas of statefulset, rollout-operator can now skip writing back number of replicas to reference resource, by setting `grafana.com/rollout-mirror-replicas-from-resource-write-back-status-replicas` annotation to `false`. #169

## v0.18.0

* [FEATURE] Optionally only scale-up a `StatefulSet` once all of the leader `StatefulSet` replicas are ready. Enable with `grafana.com/rollout-upscale-only-when-leader-ready` annotation set to `true`. #164

## v0.17.1

* [ENHANCEMENT] prepare-downscale admission webhook: undo prepare-shutdown calls if adding the `last-downscale` annotation fails. #151

## v0.17.0

* [CHANGE] The docker base images are now based off distroless images rather than Alpine. #149
* The standard base image is now `gcr.io/distroless/static-debian12:nonroot`.
* The boringcrypto base image is now `gcr.io/distroless/base-nossl-debian12:nonroot` (for glibc).
* [ENHANCEMENT] Include unique IDs of webhook requests in logs for easier debugging. #150
* [ENHANCEMENT] Include k8s operation username in request debug logs. #152
* [ENHANCEMENT] `rollout-max-unavailable` annotation can now be specified as percentage, e.g.: `rollout-max-unavailable: 25%`. Resulting value is computed as `floor(replicas * percentage)`, but is never less than 1. #153
* [ENHANCEMENT] Delayed downscale of statefulset can now reduce replicas earlier, if subset of pods at the end of statefulset have already reached their delay. #156
* [BUGFIX] Fix a mangled error log in controller's delayed downscale code. #154

## v0.16.0

* [ENHANCEMENT] If the POST to prepare-shutdown fails for any replica, attempt to undo the operation by issuing an HTTP DELETE to prepare-shutdown for all target replicas. #146

## v0.15.0

* [CHANGE] Rollout-operator is now released under an Apache License 2.0. #139, #140
* [ENHANCEMENT] Updated dependencies, including: #144
* `github.com/prometheus/common` from `v0.49.0` to `v0.53.0`
* `k8s.io/api` from `v0.29.2` to `v0.30.0`
* `k8s.io/apimachinery` from `v0.29.2` to `v0.30.0`
* `k8s.io/client-go` from `v0.29.2` to `v0.30.0`

## v0.14.0

* [FEATURE] Rollout-operator can now "mirror" replicas of statefulset from any reference resource. `status.replicas` field of reference resource is kept up-to-date with current number of replicas in target statefulset. #129
Expand Down
12 changes: 4 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
FROM golang:1.22-bookworm AS build
ARG BASEIMAGE

FROM golang:1.23-bookworm AS build

ARG TARGETOS
ARG TARGETARCH
Expand All @@ -8,17 +10,11 @@ COPY . /src/rollout-operator
WORKDIR /src/rollout-operator
RUN GOOS=${TARGETOS} GOARCH=${TARGETARCH} make ${BUILDTARGET}

FROM alpine:3.19
RUN apk add --no-cache ca-certificates gcompat
FROM ${BASEIMAGE}

COPY --from=build /src/rollout-operator/rollout-operator /bin/rollout-operator
ENTRYPOINT [ "/bin/rollout-operator" ]

# Create rollout-operator user to run as non-root.
RUN addgroup -g 10000 -S rollout-operator && \
adduser -u 10000 -S rollout-operator -G rollout-operator
USER rollout-operator:rollout-operator

ARG revision
LABEL org.opencontainers.image.title="rollout-operator" \
org.opencontainers.image.source="https://github.com/grafana/rollout-operator" \
Expand Down
12 changes: 8 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ GOARCH ?= $(shell go env GOARCH)
DONT_FIND := -name vendor -prune -o -name .git -prune -o -name .cache -prune -o -name .pkg -prune
GO_FILES := $(shell find . $(DONT_FIND) -o -type f -name '*.go' -print)

BASE_IMAGE=gcr.io/distroless/static-debian12:nonroot
# Boringcrypto has a different base image for glibc
BORINGCRYPTO_BASE_IMAGE=gcr.io/distroless/base-nossl-debian12:nonroot

.DEFAULT_GOAL := rollout-operator

# Adapted from https://www.thapaliya.com/en/writings/well-documented-makefiles/
Expand All @@ -31,23 +35,23 @@ rollout-operator-boringcrypto: $(GO_FILES) ## Build the rollout-operator binary

.PHONY: build-image
build-image: clean ## Build the rollout-operator image
docker buildx build --load --platform linux/amd64 --build-arg revision=$(GIT_REVISION) -t rollout-operator:latest -t rollout-operator:$(IMAGE_TAG) .
docker buildx build --load --platform linux/amd64 --build-arg revision=$(GIT_REVISION) --build-arg BASEIMAGE=$(BASE_IMAGE) -t rollout-operator:latest -t rollout-operator:$(IMAGE_TAG) .

.PHONY: build-image-boringcrypto
build-image-boringcrypto: clean ## Build the rollout-operator image with boringcrypto
# Tags with the regular image repo for integration testing
docker buildx build --load --platform linux/amd64 --build-arg revision=$(GIT_REVISION) --build-arg BUILDTARGET=rollout-operator-boringcrypto -t rollout-operator:latest -t rollout-operator:$(IMAGE_TAG) .
docker buildx build --load --platform linux/amd64 --build-arg revision=$(GIT_REVISION) --build-arg BASEIMAGE=$(BORINGCRYPTO_BASE_IMAGE) --build-arg BUILDTARGET=rollout-operator-boringcrypto -t rollout-operator:latest -t rollout-operator:$(IMAGE_TAG) .

.PHONY: publish-images
publish-images: publish-standard-image publish-boringcrypto-image ## Build and publish both the standard and boringcrypto images

.PHONY: publish-standard-image
publish-standard-image: clean ## Build and publish only the standard rollout-operator image
docker buildx build --push --platform linux/amd64,linux/arm64 --build-arg revision=$(GIT_REVISION) --build-arg BUILDTARGET=rollout-operator -t $(IMAGE_PREFIX)/rollout-operator:$(IMAGE_TAG) .
docker buildx build --push --platform linux/amd64,linux/arm64 --build-arg revision=$(GIT_REVISION) --build-arg BASEIMAGE=$(BASE_IMAGE) --build-arg BUILDTARGET=rollout-operator -t $(IMAGE_PREFIX)/rollout-operator:$(IMAGE_TAG) .

.PHONY: publish-boringcrypto-image
publish-boringcrypto-image: clean ## Build and publish only the boring-crypto rollout-operator image
docker buildx build --push --platform linux/amd64,linux/arm64 --build-arg revision=$(GIT_REVISION) --build-arg BUILDTARGET=rollout-operator-boringcrypto -t $(IMAGE_PREFIX)/rollout-operator-boringcrypto:$(IMAGE_TAG) .
docker buildx build --push --platform linux/amd64,linux/arm64 --build-arg revision=$(GIT_REVISION) --build-arg BASEIMAGE=$(BORINGCRYPTO_BASE_IMAGE) --build-arg BUILDTARGET=rollout-operator-boringcrypto -t $(IMAGE_PREFIX)/rollout-operator-boringcrypto:$(IMAGE_TAG) .

.PHONY: test
test: ## Run tests
Expand Down
24 changes: 17 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This operator coordinates the rollout of pods between different StatefulSets wit

## How updates work

The operator coordinates the rollout of pods belonging to `StatefulSets` with the `rollout-group` label and updates strategy set to `OnDelete`. The label value should identify the group of StatefulSets to which the StatefulSet belongs to. Make sure the statefulset has a label `name` in its `spec.template`, the operator uses it to find pods belonging to it.
The operator coordinates the rollout of pods belonging to `StatefulSets` with the `rollout-group` label and updates strategy set to `OnDelete`. The label value should identify the group of StatefulSets to which the StatefulSet belongs to. Make sure the StatefulSet has a label `name` in its `spec.template`, as the operator uses it to find pods belonging to it.

For example, given the following StatefulSets in a namespace:
- `ingester-zone-a` with `rollout-group: ingester`
Expand All @@ -25,15 +25,19 @@ For each **rollout group**, the operator **guarantees**:
1. Pods in a StatefulSet are rolled out if and only if all pods in all other StatefulSets of the same group are `Ready` (otherwise it will start or continue the rollout once this check is satisfied)
1. Pods are rolled out if and only if all StatefulSets in the same group have `OnDelete` update strategy (otherwise the operator will skip the group and log an error)
1. The maximum number of not-Ready pods in a StatefulSet doesn't exceed the value configured in the `rollout-max-unavailable` annotation (if not set, it defaults to `1`). Values:
- `<= 0`: invalid (will default to `1` and log a warning)
- `1`: pods are rolled out sequentially
- `> 1`: pods are rolled out in parallel (honoring the configured number of max unavailable pods)
- `<= 0`: invalid (will default to `1` and log a warning)
- `1`: pods are rolled out sequentially
- `> 1`: pods are rolled out in parallel (honoring the configured number of max unavailable pods)

## How scaling up and down works

The operator can also optionally coordinate scaling up and down of `StatefulSets` that are part of the same `rollout-group` based on the `grafana.com/rollout-downscale-leader` annotation. When using this feature, the `grafana.com/min-time-between-zones-downscale` label must also be set on each `StatefulSet`.

This can be useful for automating the tedious scaling of stateful services like Mimir ingesters. Making use of this feature requires adding a few annotations and labels to configure how it works. Examples for a multi-AZ ingester group are given below.
This can be useful for automating the tedious scaling of stateful services like Mimir ingesters. Making use of this feature requires adding a few annotations and labels to configure how it works.

If the `grafana.com/rollout-upscale-only-when-leader-ready` annotation is set to `true` on a follower `StatefulSet`, the operator will only scale up the follower once all replicas in the leader `StatefulSet` are `ready`. This ensures that the follower zone does not scale up until the leader zone is completely stable.

Example usage for a multi-AZ ingester group:

- For `ingester-zone-a`, add the following:
- Labels:
Expand All @@ -47,7 +51,8 @@ This can be useful for automating the tedious scaling of stateful services like
- `grafana.com/min-time-between-zones-downscale=12h` (change the value here to an appropriate duration)
- `grafana.com/prepare-downscale=true` (to allow the service to be notified when it will be scaled down)
- Annotations:
- `grafana.com/rollout-downscale-leader=ingester-zone-a` (zone `b` will follow zone `a`, after a delay)
- `grafana.com/rollout-downscale-leader=ingester-zone-a` (zone `b` will follow zone `a`, after a delay)
- `grafana.com/rollout-upscale-only-when-leader-ready=true` (zone `b` will only scale up once all replicas in zone `a` are ready)
- `grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown` (to call a specific endpoint on the service)
- `grafana.com/prepare-downscale-http-port=80` (to call a specific endpoint on the service)
- For `ingester-zone-c`, add the following:
Expand All @@ -56,6 +61,7 @@ This can be useful for automating the tedious scaling of stateful services like
- `grafana.com/prepare-downscale=true` (to allow the service to be notified when it will be scaled down)
- Annotations:
- `grafana.com/rollout-downscale-leader=ingester-zone-b` (zone `c` will follow zone `b`, after a delay)
- `grafana.com/rollout-upscale-only-when-leader-ready=true` (zone `c` will only scale up once all replicas in zone `b` are ready)
- `grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown` (to call a specific endpoint on the service)
- `grafana.com/prepare-downscale-http-port=80` (to call a specific endpoint on the service)

Expand All @@ -66,8 +72,12 @@ Rollout-operator can use custom resource with `scale` and `status` subresources
* `grafana.com/rollout-mirror-replicas-from-resource-name`
* `grafana.com/rollout-mirror-replicas-from-resource-kind`
* `grafana.com/rollout-mirror-replicas-from-resource-api-version`
* `grafana.com/rollout-mirror-replicas-from-resource-write-back`

These annotations must be set on StatefulSet that rollout-operator will scale (ie. target statefulset). Number of replicas in target statefulset will follow replicas in reference resource (from `scale` subresource), while reference resource's `status` subresource will be updated with current number of replicas in target statefulset.
These annotations must be set on StatefulSet that rollout-operator will scale (ie. target statefulset).
Number of replicas in target statefulset will follow replicas in reference resource (from `scale` subresource).
Reference resource's `status` subresource will be updated with current number of replicas in target statefulset,
unless explicitly disabled by setting `grafana.com/rollout-mirror-replicas-from-resource-write-back` annotation to `false`.

This is similar to using `grafana.com/rollout-downscale-leader`, but reference resource can be any kind of resource, not just statefulset. Furthermore `grafana.com/min-time-between-zones-downscale` is not respected when using scaling based on reference resource.

Expand Down
3 changes: 3 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,6 @@
```bash
$ IMAGE_TAG="${tag}" make publish-images
```
1. Update Helm Chart
- Repository https://github.com/grafana/helm-charts/tree/main/charts/rollout-operator
- [Example PR](https://github.com/grafana/helm-charts/pull/3177/files)
4 changes: 2 additions & 2 deletions cmd/rollout-operator/instrumentation.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,11 @@ func newInstrumentedRouter(metrics *metrics) (*mux.Router, http.Handler) {
router := mux.NewRouter()

httpMiddleware := []middleware.Interface{
middleware.Tracer{
middleware.RouteInjector{
RouteMatcher: router,
},
middleware.Tracer{},
middleware.Instrument{
RouteMatcher: router,
Duration: metrics.RequestDuration,
RequestBodySize: metrics.ReceivedMessageSize,
ResponseBodySize: metrics.SentMessageSize,
Expand Down
Loading