Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage of cainjector by only caching the metadata of Secret resources #7161

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Jul 9, 2024

Update cainjector to use metadata-only caching for Secret resources:

  1. To reduce memory use of cainjector, by only caching metadata of Secret resources in-memory.
  2. To reduce the load on the K8S API server when cainjector starts up, caused by the initial listing of full Secret resources in the cluster.

With 100M of Secrets in the cluster, this reduces the maximum memory usage from ~290M to ~46M:

  • Before: "rss_max_kilobytes": 289268
  • After: "rss_max_kilobytes": 45652

Behind the scenes the reflector is asking the server for a PartialObjectMetadataList projection instead of the full data SecretList.
You can see the reduction in the response size using curl as follows:

# Create 1Mi file. The maximum size of a Secret
dd if=/dev/zero bs=1048576 count=1 of=f1

# Create 100M of secrets
echo -n {0..99} | xargs -d ' ' -P5 -I{} kubectl create secret generic s-$RANDOM-{} --from-file=f1

# Get API server connection parameters
kubectl get cm -n kube-public kube-root-ca.crt  -o jsonpath='{ .data.ca\.crt }' > ca.crt
export URL="$(kubectl cluster-info | grep 'Kubernetes control plane' | egrep -o 'https://[a-z0-9.]+:[0-9]+')"
export TOKEN="$(kubectl create token -n cert-manager cert-manager-cainjector)"
curl "${URL}/api/v1/secrets" \
     -fsSL \
     --cacert ca.crt \
     -H "Authorization: Bearer ${TOKEN}" \
     -H "Accept: application/json" \
    | dd of=/dev/null
...
274790+1 records in
274790+1 records out
140692829 bytes (141 MB, 134 MiB) copied, 1.76449 s, 79.7 MB/s
curl "${URL}/api/v1/secrets" \
     -fsSL \
     --cacert ca.crt \
     -H "Authorization: Bearer ${TOKEN}" \
     -H "Accept: application/json;as=PartialObjectMetadataList;g=meta.k8s.io;v=v1" \
    | dd of=/dev/null
...
167+1 records in
167+1 records out
85806 bytes (86 kB, 84 KiB) copied, 0.404721 s, 212 kB/s

📖 Read about Alternate representations of resources
and Graduate Server-side Get and Partial Objects to GA

Background

Fixes: #7147, #3748

/kind feature

Reduce the memory usage of `cainjector`, by only caching the metadata of Secret resources.
Reduce the load on the K8S API server when `cainjector` starts up, by only listing the metadata of Secret resources.

Testing

Measuring peak memory use at startup

I used time to measure the maximum resident set size of cainjector when it starts up with a cluster containing 100M of Secrets.
I create a cluster, load it with 100M of Secrets and then run time cainjector on my laptop to measure its resource usage for a few seconds, as it starts up.
Leader election is disabled, so that cainjector can immediately begin listing and watching resources.

# Create cluster
kind create cluster

# Install cert-manager CRDs only (cainjector watches Certificate resources)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.1/cert-manager.crds.yaml

# Create 1Mi file. The maximum size of a Secret
dd if=/dev/zero bs=1048576 count=1 of=f1

# Create ~100Mi of Secrets, 5 at a time
echo -n {0..99} | xargs -d ' ' -P5 -I{} kubectl create secret generic s-$RANDOM-{} --from-file=f1

# Configure `time` to record the maximum resident set size and a bunch of other measurements
# https://www.man7.org/linux/man-pages/man1/time.1.html#GNU_VERSION
export TIME='{
    "cpu_seconds_user": %U,
    "cpu_seconds_system": %S,
    "time_elapsed": "%E",
    "cpu_percent": "%P",
    "text_kilobytes": %X,
    "data_kilobytes": %D,
    "rss_max_kilobytes": %M,
    "fs_inputs": %I,
    "fs_outputs": %O,
    "page_faults_major": %F,
    "page_faults_minor": %R,
    "swaps": %W
}'

function measure() {
    gitref=$1
    outfile=$2

    # Checkout the git reference
    # https://stackoverflow.com/a/45967995
    git fetch origin "$gitref" && git checkout FETCH_HEAD
    # Build the cainjector
    make _bin/server/cainjector-linux-amd64

    # Run it locally for 3 seconds and measure resource usage using time.
    /usr/bin/time -o "$outfile" -- \
                  _bin/server/cainjector-linux-amd64 --kubeconfig ~/.kube/config -v1 --leader-elect=false &
    sleep 3
    killall cainjector-linux-amd64
    wait
}

measure master master.json
measure pull/7161/head branch.json

diff -u master.json branch.json
--- master.json 2024-07-10 12:14:07.863996263 +0100
+++ branch.json 2024-07-10 12:14:13.354828683 +0100
@@ -1,14 +1,14 @@
 {
-    "cpu_seconds_user": 0.30,
-    "cpu_seconds_system": 0.09,
+    "cpu_seconds_user": 0.05,
+    "cpu_seconds_system": 0.02,
     "time_elapsed": "0:03.00",
-    "cpu_percent": "13%",
+    "cpu_percent": "2%",
     "text_kilobytes": 0,
     "data_kilobytes": 0,
-    "rss_max_kilobytes": 289268,
+    "rss_max_kilobytes": 45652,
     "fs_inputs": 0,
     "fs_outputs": 0,
     "page_faults_major": 0,
-    "page_faults_minor": 3486,
+    "page_faults_minor": 2008,
     "swaps": 0
 }

Benchmarks

I ran tlspk-bench to create 1000 RSA 2048 Certificates to compare the memory usage of cert-manager from master and this branch.
1000 RSA 2048 Secrets amounts to ~6MB of data so the difference in memory usage is not dramatic.

$ kubectl get secret -l controller.cert-manager.io/fao -A -o json | dd bs=1M of=/dev/null
0+95 records in
0+95 records out
6200133 bytes (6.2 MB, 5.9 MiB) copied, 0.621156 s, 10.0 MB/s

But it is visible in the graphs below.

  • master
    image

  • branch
    image

@cert-manager-prow cert-manager-prow bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 9, 2024
@wallrj wallrj force-pushed the 7147-cainjector-metadata-only-cache branch from 04f1386 to a376b7b Compare July 9, 2024 21:03
@wallrj wallrj force-pushed the 7147-cainjector-metadata-only-cache branch from a376b7b to 15084fd Compare July 10, 2024 09:07
@cert-manager-prow cert-manager-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 10, 2024
@cert-manager-prow cert-manager-prow bot added the kind/design Categorizes issue or PR as related to design. label Jul 10, 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📖 Documentation for builder.OnlyMetadata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation brings back memories... I might have been the one who added these two emojis in the comments: kubernetes-sigs/controller-runtime#1747 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📖 Documentation for client.CacheOptions.DisableFor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the signature of owningCertForSecret so that it can be passed

  • a metav1.PartialObjectMetadata when called from the Watch map function, which is now operating on a metadata-only informer cache
  • a corev1.Secret when called from the certificateSource.ReadCA function, where the full Secret (including data) has been read direct from the API server.

@wallrj wallrj changed the title WIP: Reduce memory usage by only caching the metadata of Secret resources Reduce memory usage by only caching the metadata of Secret resources Jul 10, 2024
@cert-manager-prow cert-manager-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 10, 2024
@wallrj wallrj requested a review from maelvls July 10, 2024 13:06
@maelvls
Copy link
Member

maelvls commented Jul 11, 2024

Hey, well done with the benchmarks on this PR. Super happy to see more data-driven approaches like this one. And thank you for providing the detailed instructions for reproducing your experiments. 👏

To reduce the load on the K8S API server when cainjector starts up, caused by the initial listing of Secret resources in the cluster.

Are there any downsides to disabling caching? When using Upbound's Crossplane, at least 900 CRDs are installed and are expected to be CA-injected by cert-manager's cainjector. Would disabling be less favorable in that case? I imagine that on startup all the secrets will need to be listed to check that they match the contents of the CA injected in the CRDs.

1000 RSA 2048 Secrets amounts to ~6MB of data so the difference in memory usage is not dramatic.

I think the improvement would be much more dramatic in a cluster with lots of large Helm release secrets hanging around. This is something you will often see in prod clusters.

kubectl get secret -l controller.cert-manager.io/fao -A -o json | dd bs=1M of=/dev/null

Are you sure the controller.cert-manager.io/fao annotation trick was implemented in the cert-manager cainjector? I think it was only implemented in the cert-manager controller.

I haven't reviewed the code yet, I'll come back to it later.

@wallrj
Copy link
Member Author

wallrj commented Jul 12, 2024

@maelvls I added some example curl commands to the description showing the difference in response size when you send the PartialObjectMetadataList content negotiation header.

@maelvls
Copy link
Member

maelvls commented Jul 12, 2024

Richard presented this PR to yesterday's dev biweekly, some notes:

  • Richard made me aware of the accept header:

    accept: application/json;as= PartialObjectMetadataList;g=meta.k8s.io;v=v1
    

    (side note: this feature isn't documented anywhere... I had to dig into the PR that introduced the feature to learn its syntax)

  • Although cainjector lists all the secrets during the initial "list" HTTP call, it doesn't have to process the contents of these secrets, just their metadata. That should prevent the cainjector from getting oomkilled due to the memory surge that only happens on startup.

  • On the topic of the downsides of disabling caching, I agree that even with over 1000 CRDs, the decrease in memory usage is much preferable over having a cache. Caching would only be useful in case we were accessing or listing the secrets often, which we realistically don't do.

Comment on lines +170 to +172
// NOTE: "owning" here does not mean [ownerReference][1], because
// cert-manager does not set the ownerReference of the Secret,
// unless the [`--enable-certificate-owner-ref` flag is true][2].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification!!

Comment on lines +120 to +122
// Only use Secrets that have been created by this Certificate.
// The Secret must have a `cert-manager.io/certificate-name` annotation
// value matching the name of this Certificate..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for adding these bits of information! Nit:

Suggested change
// Only use Secrets that have been created by this Certificate.
// The Secret must have a `cert-manager.io/certificate-name` annotation
// value matching the name of this Certificate..
// Only use Secrets that have been created by this Certificate.
// The Secret must have a `cert-manager.io/certificate-name` annotation
// value matching the name of this Certificate.

Copy link
Member

@maelvls maelvls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the changes, the changes look straightforward. Thank you again for having spent the time writing additional comments. These are super useful, especially regarding "owners" vs. ownerReferences.

/lgtm
/approve
/hold in case you want to fix the nit

@cert-manager-prow cert-manager-prow bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jul 12, 2024
@cert-manager-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maelvls

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2024
@wallrj
Copy link
Member Author

wallrj commented Jul 12, 2024

Thanks @maelvls

I'll fix that nit in another PR to save another round of e2e tests and review.

/unhold

@cert-manager-prow cert-manager-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2024
@cert-manager-prow cert-manager-prow bot merged commit c746fdf into cert-manager:master Jul 12, 2024
7 checks passed
@wallrj wallrj changed the title Reduce memory usage by only caching the metadata of Secret resources Reduce memory usage of cainjector by only caching the metadata of Secret resources Jul 12, 2024
@wallrj wallrj deleted the 7147-cainjector-metadata-only-cache branch July 12, 2024 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CAInjector entering crashloop with "timed out waiting for cache to be synced"
2 participants