Reduce memory usage of cainjector by only caching the metadata of Secret resources #7161

wallrj · 2024-07-09T14:04:25Z

Update cainjector to use metadata-only caching for Secret resources:

To reduce memory use of cainjector, by only caching metadata of Secret resources in-memory.
To reduce the load on the K8S API server when cainjector starts up, caused by the initial listing of full Secret resources in the cluster.

With 100M of Secrets in the cluster, this reduces the maximum memory usage from ~290M to ~46M:

Before: "rss_max_kilobytes": 289268
After: "rss_max_kilobytes": 45652

Behind the scenes the reflector is asking the server for a PartialObjectMetadataList projection instead of the full data SecretList.
You can see the reduction in the response size using curl as follows:

# Create 1Mi file. The maximum size of a Secret
dd if=/dev/zero bs=1048576 count=1 of=f1

# Create 100M of secrets
echo -n {0..99} | xargs -d ' ' -P5 -I{} kubectl create secret generic s-$RANDOM-{} --from-file=f1

# Get API server connection parameters
kubectl get cm -n kube-public kube-root-ca.crt  -o jsonpath='{ .data.ca\.crt }' > ca.crt
export URL="$(kubectl cluster-info | grep 'Kubernetes control plane' | egrep -o 'https://[a-z0-9.]+:[0-9]+')"
export TOKEN="$(kubectl create token -n cert-manager cert-manager-cainjector)"

curl "${URL}/api/v1/secrets" \
     -fsSL \
     --cacert ca.crt \
     -H "Authorization: Bearer ${TOKEN}" \
     -H "Accept: application/json" \
    | dd of=/dev/null
...
274790+1 records in
274790+1 records out
140692829 bytes (141 MB, 134 MiB) copied, 1.76449 s, 79.7 MB/s

curl "${URL}/api/v1/secrets" \
     -fsSL \
     --cacert ca.crt \
     -H "Authorization: Bearer ${TOKEN}" \
     -H "Accept: application/json;as=PartialObjectMetadataList;g=meta.k8s.io;v=v1" \
    | dd of=/dev/null
...
167+1 records in
167+1 records out
85806 bytes (86 kB, 84 KiB) copied, 0.404721 s, 212 kB/s

📖 Read about Alternate representations of resources
and Graduate Server-side Get and Partial Objects to GA

Background

A user reports cainjector crashing on startup because the initial List secrets results in an internal server error from the EKS API server
- CAInjector entering crashloop with "timed out waiting for cache to be synced" #7147
Users report that cert-manager (not cainjector specifically) causes their K8S API servers to crash, probably because at the time, both cert-manager controller and cainjector were attempting to list all Secrets (including data) when they startup.
- Cert-manager causes API server panic on clusters with more than 20000 secrets. #3748
@irbe wrote about the advantages of using metadata-only caching in the memory-management design document:
- https://github.com/cert-manager/cert-manager/blob/master/design/20221205-memory-management.md#partial-object-metadata

Fixes: #7147, #3748

/kind feature

Reduce the memory usage of `cainjector`, by only caching the metadata of Secret resources.
Reduce the load on the K8S API server when `cainjector` starts up, by only listing the metadata of Secret resources.

Testing

Measuring peak memory use at startup

I used time to measure the maximum resident set size of cainjector when it starts up with a cluster containing 100M of Secrets.
I create a cluster, load it with 100M of Secrets and then run time cainjector on my laptop to measure its resource usage for a few seconds, as it starts up.
Leader election is disabled, so that cainjector can immediately begin listing and watching resources.

# Create cluster
kind create cluster

# Install cert-manager CRDs only (cainjector watches Certificate resources)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.1/cert-manager.crds.yaml

# Create 1Mi file. The maximum size of a Secret
dd if=/dev/zero bs=1048576 count=1 of=f1

# Create ~100Mi of Secrets, 5 at a time
echo -n {0..99} | xargs -d ' ' -P5 -I{} kubectl create secret generic s-$RANDOM-{} --from-file=f1

# Configure `time` to record the maximum resident set size and a bunch of other measurements
# https://www.man7.org/linux/man-pages/man1/time.1.html#GNU_VERSION
export TIME='{
    "cpu_seconds_user": %U,
    "cpu_seconds_system": %S,
    "time_elapsed": "%E",
    "cpu_percent": "%P",
    "text_kilobytes": %X,
    "data_kilobytes": %D,
    "rss_max_kilobytes": %M,
    "fs_inputs": %I,
    "fs_outputs": %O,
    "page_faults_major": %F,
    "page_faults_minor": %R,
    "swaps": %W
}'

function measure() {
    gitref=$1
    outfile=$2

    # Checkout the git reference
    # https://stackoverflow.com/a/45967995
    git fetch origin "$gitref" && git checkout FETCH_HEAD
    # Build the cainjector
    make _bin/server/cainjector-linux-amd64

    # Run it locally for 3 seconds and measure resource usage using time.
    /usr/bin/time -o "$outfile" -- \
                  _bin/server/cainjector-linux-amd64 --kubeconfig ~/.kube/config -v1 --leader-elect=false &
    sleep 3
    killall cainjector-linux-amd64
    wait
}

measure master master.json
measure pull/7161/head branch.json

diff -u master.json branch.json

--- master.json 2024-07-10 12:14:07.863996263 +0100
+++ branch.json 2024-07-10 12:14:13.354828683 +0100
@@ -1,14 +1,14 @@
 {
-    "cpu_seconds_user": 0.30,
-    "cpu_seconds_system": 0.09,
+    "cpu_seconds_user": 0.05,
+    "cpu_seconds_system": 0.02,
     "time_elapsed": "0:03.00",
-    "cpu_percent": "13%",
+    "cpu_percent": "2%",
     "text_kilobytes": 0,
     "data_kilobytes": 0,
-    "rss_max_kilobytes": 289268,
+    "rss_max_kilobytes": 45652,
     "fs_inputs": 0,
     "fs_outputs": 0,
     "page_faults_major": 0,
-    "page_faults_minor": 3486,
+    "page_faults_minor": 2008,
     "swaps": 0
 }

Benchmarks

I ran tlspk-bench to create 1000 RSA 2048 Certificates to compare the memory usage of cert-manager from master and this branch.
1000 RSA 2048 Secrets amounts to ~6MB of data so the difference in memory usage is not dramatic.

$ kubectl get secret -l controller.cert-manager.io/fao -A -o json | dd bs=1M of=/dev/null
0+95 records in
0+95 records out
6200133 bytes (6.2 MB, 5.9 MiB) copied, 0.621156 s, 10.0 MB/s

But it is visible in the graphs below.

master
branch

Signed-off-by: Richard Wall <[email protected]>

wallrj · 2024-07-10T11:31:55Z

pkg/controller/cainjector/setup.go

📖 Documentation for builder.OnlyMetadata

This documentation brings back memories... I might have been the one who added these two emojis in the comments: kubernetes-sigs/controller-runtime#1747 😅

wallrj · 2024-07-10T11:42:21Z

cmd/cainjector/app/controller.go

📖 Documentation for client.CacheOptions.DisableFor.

wallrj · 2024-07-10T11:46:14Z

pkg/controller/cainjector/reconciler.go

I changed the signature of owningCertForSecret so that it can be passed

a metav1.PartialObjectMetadata when called from the Watch map function, which is now operating on a metadata-only informer cache

a corev1.Secret when called from the certificateSource.ReadCA function, where the full Secret (including data) has been read direct from the API server.

maelvls · 2024-07-11T09:04:28Z

Hey, well done with the benchmarks on this PR. Super happy to see more data-driven approaches like this one. And thank you for providing the detailed instructions for reproducing your experiments. 👏

To reduce the load on the K8S API server when cainjector starts up, caused by the initial listing of Secret resources in the cluster.

Are there any downsides to disabling caching? When using Upbound's Crossplane, at least 900 CRDs are installed and are expected to be CA-injected by cert-manager's cainjector. Would disabling be less favorable in that case? I imagine that on startup all the secrets will need to be listed to check that they match the contents of the CA injected in the CRDs.

1000 RSA 2048 Secrets amounts to ~6MB of data so the difference in memory usage is not dramatic.

I think the improvement would be much more dramatic in a cluster with lots of large Helm release secrets hanging around. This is something you will often see in prod clusters.

kubectl get secret -l controller.cert-manager.io/fao -A -o json | dd bs=1M of=/dev/null

Are you sure the controller.cert-manager.io/fao annotation trick was implemented in the cert-manager cainjector? I think it was only implemented in the cert-manager controller.

I haven't reviewed the code yet, I'll come back to it later.

wallrj · 2024-07-12T09:50:32Z

@maelvls I added some example curl commands to the description showing the difference in response size when you send the PartialObjectMetadataList content negotiation header.

maelvls · 2024-07-12T09:58:21Z

Richard presented this PR to yesterday's dev biweekly, some notes:

Richard made me aware of the accept header:
```
accept: application/json;as= PartialObjectMetadataList;g=meta.k8s.io;v=v1
```
(side note: this feature isn't documented anywhere... I had to dig into the PR that introduced the feature to learn its syntax)
Although cainjector lists all the secrets during the initial "list" HTTP call, it doesn't have to process the contents of these secrets, just their metadata. That should prevent the cainjector from getting oomkilled due to the memory surge that only happens on startup.
On the topic of the downsides of disabling caching, I agree that even with over 1000 CRDs, the decrease in memory usage is much preferable over having a cache. Caching would only be useful in case we were accessing or listing the secrets often, which we realistically don't do.

maelvls · 2024-07-12T10:00:39Z

pkg/controller/cainjector/reconciler.go

+// NOTE: "owning" here does not mean [ownerReference][1], because
+// cert-manager does not set the ownerReference of the Secret,
+// unless the [`--enable-certificate-owner-ref` flag is true][2].


Thanks for the clarification!!

maelvls · 2024-07-12T10:05:42Z

pkg/controller/cainjector/sources.go

+	// Only use Secrets that have been created by this Certificate.
+	// The Secret must have a `cert-manager.io/certificate-name` annotation
+	// value matching the name of this Certificate..


Thanks again for adding these bits of information! Nit:

Suggested change

// Only use Secrets that have been created by this Certificate.

// The Secret must have a `cert-manager.io/certificate-name` annotation

// value matching the name of this Certificate..

// Only use Secrets that have been created by this Certificate.

// The Secret must have a `cert-manager.io/certificate-name` annotation

// value matching the name of this Certificate.

maelvls

I have reviewed the changes, the changes look straightforward. Thank you again for having spent the time writing additional comments. These are super useful, especially regarding "owners" vs. ownerReferences.

/lgtm
/approve
/hold in case you want to fix the nit

cert-manager-prow · 2024-07-12T10:08:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maelvls

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [maelvls]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wallrj · 2024-07-12T10:16:55Z

Thanks @maelvls

I'll fix that nit in another PR to save another round of e2e tests and review.

/unhold

wallrj force-pushed the 7147-cainjector-metadata-only-cache branch from 04f1386 to a376b7b Compare July 9, 2024 21:03

wallrj added 2 commits July 10, 2024 10:07

Reduce memory usage by only caching the metadata of Secret resources

8f9ccf3

Signed-off-by: Richard Wall <[email protected]>

make go-tidy

15084fd

Signed-off-by: Richard Wall <[email protected]>

wallrj force-pushed the 7147-cainjector-metadata-only-cache branch from a376b7b to 15084fd Compare July 10, 2024 09:07

cert-manager-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 10, 2024

Update the memory-management design document

961e81b

Signed-off-by: Richard Wall <[email protected]>

cert-manager-prow bot added the kind/design Categorizes issue or PR as related to design. label Jul 10, 2024

wallrj commented Jul 10, 2024

View reviewed changes

wallrj changed the title ~~WIP: Reduce memory usage by only caching the metadata of Secret resources~~ Reduce memory usage by only caching the metadata of Secret resources Jul 10, 2024

cert-manager-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 10, 2024

wallrj requested a review from maelvls July 10, 2024 13:06

maelvls reviewed Jul 12, 2024

View reviewed changes

maelvls approved these changes Jul 12, 2024

View reviewed changes

cert-manager-prow bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jul 12, 2024

cert-manager-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2024

cert-manager-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2024

cert-manager-prow bot merged commit c746fdf into cert-manager:master Jul 12, 2024
7 checks passed

wallrj changed the title ~~Reduce memory usage by only caching the metadata of Secret resources~~ Reduce memory usage of cainjector by only caching the metadata of Secret resources Jul 12, 2024

wallrj deleted the 7147-cainjector-metadata-only-cache branch July 12, 2024 13:42

wallrj mentioned this pull request Jul 16, 2024

Cert-manager causes API server panic on clusters with more than 20000 secrets. #3748

Closed

maelvls mentioned this pull request Jul 22, 2024

fix/vc-31703-agent-memory-startup-spikes jetstack/jetstack-secure#525

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage of cainjector by only caching the metadata of Secret resources #7161

Reduce memory usage of cainjector by only caching the metadata of Secret resources #7161

wallrj commented Jul 9, 2024 •

edited

Loading

wallrj Jul 10, 2024

maelvls Jul 12, 2024

wallrj Jul 10, 2024

wallrj Jul 10, 2024

maelvls commented Jul 11, 2024

wallrj commented Jul 12, 2024

maelvls commented Jul 12, 2024 •

edited

Loading

maelvls Jul 12, 2024

maelvls Jul 12, 2024

maelvls left a comment

cert-manager-prow bot commented Jul 12, 2024

wallrj commented Jul 12, 2024

Reduce memory usage of cainjector by only caching the metadata of Secret resources #7161

Reduce memory usage of cainjector by only caching the metadata of Secret resources #7161

Conversation

wallrj commented Jul 9, 2024 • edited Loading

Background

Testing

Measuring peak memory use at startup

Benchmarks

wallrj Jul 10, 2024

Choose a reason for hiding this comment

maelvls Jul 12, 2024

Choose a reason for hiding this comment

wallrj Jul 10, 2024

Choose a reason for hiding this comment

wallrj Jul 10, 2024

Choose a reason for hiding this comment

maelvls commented Jul 11, 2024

wallrj commented Jul 12, 2024

maelvls commented Jul 12, 2024 • edited Loading

maelvls Jul 12, 2024

Choose a reason for hiding this comment

maelvls Jul 12, 2024

Choose a reason for hiding this comment

maelvls left a comment

Choose a reason for hiding this comment

cert-manager-prow bot commented Jul 12, 2024

wallrj commented Jul 12, 2024

wallrj commented Jul 9, 2024 •

edited

Loading

maelvls commented Jul 12, 2024 •

edited

Loading