Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Use separate cache for partial metadata watches on secrets to include all secrets #10633

Merged

Conversation

chrischdi
Copy link
Member

@chrischdi chrischdi commented May 16, 2024

What this PR does / why we need it:

This PR introduces a separate cache which is used in the clusterresourceset_controller for watching secrets.

Previously the WatchesMetadata for secrets in clusterresourcesset_controller did inherit the LabelSelector configured in main.go:

https://github.com/kubernetes-sigs/cluster-api/blob/main/main.go#L322-L329

This label selector gets passed through in controller-runtime for the informer which gets created for the watch.

Secrets for clusterresourcesets may apply for multiple clusters, so the label selector may not even exist at the secrets referred by clusterresourcesets.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #10557

/area clusterresourceset

@k8s-ci-robot k8s-ci-robot added area/clusterresourceset Issues or PRs related to clusterresourcesets cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 16, 2024
@chrischdi chrischdi changed the title 🐛 crs: use separate cache for partial metadata watches on secrets to in… 🐛 crs: use separate cache for partial metadata watches on secrets to include all secrets May 16, 2024
@sbueringer
Copy link
Member

Very nice!

@chrischdi chrischdi force-pushed the pr-crs-watch-partial-all-secrets branch from 9fde0e2 to 4f90185 Compare May 21, 2024 08:48
@chrischdi chrischdi added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label May 21, 2024
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@sbueringer
Copy link
Member

@chrischdi can you please check the unit tests?

@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-main

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits. Sorry for the nitpicking, just playing around a bit with generics and trying to find the simplest implementation

Otherwise all good, also tested it and it works perfectly (inspected the caches at runtime)

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

Comment on lines 193 to 195
// secretToExtensionConfigFunc returns a func which maps a secret to ExtensionConfigs with the corresponding
// InjectCAFromSecretAnnotation to reconcile them on updates of the secrets.
func (r *Reconciler) secretToExtensionConfigFunc(ctx context.Context, o *metav1.PartialObjectMetadata) []reconcile.Request {
Copy link
Member

@sbueringer sbueringer May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert this (func name + godoc) entirely to what is on main? I think the godoc is not correct anymore (+ the func name is a bit inconsistent now with how we usually call these funcs)

@sbueringer
Copy link
Member

Last nit from my side

/assign @fabriziopandini

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@sbueringer
Copy link
Member

Thank you very much!
/lgtm

Let's get some additional reviews if possible, just in case I'm missing something
/assign @fabriziopandini @vincepri

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 28, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: fd0b933f00763538f0332835600823f4a8a7933d

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change!

main.go Outdated Show resolved Hide resolved
// This way the watch does not use the LabelSelector defined at the cache which
// would filter to secrets having the cluster label, because secrets referred
// by ClusterResourceSet or ExtensionConfig are not specific to a single cluster.
partialSecretCache, err := cache.New(mgr.GetConfig(), cache.Options{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: should this be allSecretCache instead of partialSecretCache (nothing in the definition points to partial)
q: is there a way to make sure this cache is used only for Secrets (I think not, but might be we can enforce this with a DefaultTransformerFunc that always returns error)
q: should we use TransformStripManagedFields for secrets? (not necessary, but it doesn't hurt)

cc @sbueringer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intention why I named it partialSecretCache is that we tend to only use it for PartialObjectMetadata watches/objects. Maybe I should add that information to the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of adding a DefaultTransformerFunc and implemented it.

This way we can make sure to not mis-use the cache 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we hit the error cases now? (does the controller fail? are we getting not founds on get? anything else?)

Is the behavior good enough to make sure we never use the cache for the wrong purpose? Or will it just mean that some of our code doesn't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be better if we just panic if someone tries to use this cache for the wrong gvk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently only logging but retrying.

E0604 05:56:24.840511      17 reflector.go:150] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1.PartialObjectMetadata: unable to sync list result: couldn't enqueue object: cache expected to only get Secrets, got &TypeMeta{Kind:ConfigMap,APIVersion:v1,}

I'll adjust to do a panic instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it observes the panic, prints the trace and gets stuck (which I think is better than just logging the error)

E0604 05:58:32.951254      62 runtime.go:79] Observed a panic: &errors.errorString{s:"cache expected to only get Secrets, got &TypeMeta{Kind:ConfigMap,APIVersion:v1,}"} (cache expected to only get Secrets, got &TypeMeta{Kind:ConfigMap,APIVersion:v1,})
goroutine 313 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x271ef40, 0x40003a6bf0})
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0xdc
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0})
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0xb4
panic({0x271ef40?, 0x40003a6bf0?})
	/Users/schlotterc/.bin/go-archive/go1.22.1.darwin-arm64/src/runtime/panic.go:770 +0xf0
main.setupReconcilers.func1({0x2aaede0, 0x40008f37a0})
	/Users/schlotterc/go/src/sigs.k8s.io/cluster-api/main.go:455 +0x2f0
k8s.io/client-go/tools/cache.(*DeltaFIFO).queueActionLocked(0x4000362f20, {0x2aeb978, 0x8}, {0x2aaede0, 0x40008f37a0})
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/tools/cache/delta_fifo.go:456 +0x15c
k8s.io/client-go/tools/cache.(*DeltaFIFO).Replace(0x4000362f20, {0x4000292280, 0x13, 0x13}, {0x40007e99b8, 0x4})
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/tools/cache/delta_fifo.go:641 +0x390
k8s.io/client-go/tools/cache.(*Reflector).syncWith(0x4000988a80, {0x4000292140, 0x13, 0x13}, {0x40007e99b8, 0x4})
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:706 +0x1d0
k8s.io/client-go/tools/cache.(*Reflector).list(0x4000988a80, 0x4000a1ae40)
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:577 +0xe68
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0x4000988a80, 0x4000a1ae40)
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:353 +0x344
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:298 +0x30
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40009ede38)
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x48
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x4000a59e38, {0x2d710c0, 0x4000290b90}, 0x1, 0x4000a1ae40)
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xa0
k8s.io/client-go/tools/cache.(*Reflector).Run(0x4000988a80, 0x4000a1ae40)
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:297 +0x24c
k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:55 +0x34
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0xa8
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 305
	/Users/schlotterc/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:70 +0xc4
2024-06-04T05:58:32Z error layer=rpc writing response:write tcp [::1]:30000->[::1]:46760: use of closed network connection
█

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels May 30, 2024
@chrischdi chrischdi force-pushed the pr-crs-watch-partial-all-secrets branch from 53006e4 to 1dd1d9e Compare May 31, 2024 09:05
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 31, 2024
@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@chrischdi
Copy link
Member Author

Cosmetics:

/override pull-cluster-api-apidiff-main

@k8s-ci-robot
Copy link
Contributor

@chrischdi: chrischdi unauthorized: /override is restricted to Repo administrators.

In response to this:

Cosmetics:

/override pull-cluster-api-apidiff-main

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-apidiff-main d58580c link false /test pull-cluster-api-apidiff-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sbueringer
Copy link
Member

Thank you!

/lgtm

/assign @fabriziopandini

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: bcdf8982b46ba4accb1c3f9268684e250b1639af

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 14, 2024
@k8s-ci-robot k8s-ci-robot merged commit 9a2d8cd into kubernetes-sigs:main Jun 14, 2024
19 of 20 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.8 milestone Jun 14, 2024
@sbueringer sbueringer changed the title 🐛 crs: use separate cache for partial metadata watches on secrets to include all secrets 🐛 Use separate cache for partial metadata watches on secrets to include all secrets Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clusterresourceset Issues or PRs related to clusterresourcesets cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CRS associated secret updates will not trigger CRS Reconcile
5 participants