-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImageRepository manifests ignoring spec.secretRef changes #2286
Comments
I have done a test to try to confirm this report, and I was able to confirm so far the issue seems to exist at least as far back as Flux 0.24.1 In my case I removed a Applied on a fresh cluster with 0.24.1 (or even just deleting the earlier GitRepository first) the first of these commits, I would have expected to remove the fields which Flux applied in earlier versions, but they are not removed. The second commit is definitely invalid according to the
If I delete the
I'm going to start checking older versions of Flux since around 0.18 since that's when server side apply was first used and see if they all mirror this behavior, or if I can narrow down by bisecting which change actually introduced this behavior, since it does not seem like correct behavior and I'm assuming this information will be helpful now that I've reproduced an issue. Thanks for your report @stvnksslr I'll let you know what I find out 👍 |
There is definitely a regression between 0.18.3 and 0.24.1, I'm working on narrowing it down now. In Flux 0.18.3, this commit definitely removes the |
I found an error in my testing approach, something has changed and I no longer seem to have a repro for this issue now that I corrected my error. Even on the latest version, where I thought I was able to reproduce this issue easily on the first try, now it does not seem to be throwing a problem at all. The GitRepository updates and removes its I'll spend some more time on it to see if I can see what I was seeing again, since it would be very helpful to have a clean repro for some of the server side apply issues we've seen reported for a while now. |
Hmm, it seems the issue reproduces cleanly on a new v0.25.1 installation, as in fresh new cluster which has only just had v0.25.1 installed on it, without upgrading from any earlier version. If I create a GitRepository or some other resource with a If I upgrade from one version to the next, the issue does not reproduce reliably anymore. I am still trying to isolate the first version that had this issue. It takes some time because I have to destroy the entire cluster and try again with each suspect version on a fresh cluster, almost seems as though it matters which version of Flux is upgraded to which version of Flux. |
v0.23.0 and prior does not seem to have this bug. I can bootstrap Flux fresh on a new cluster with this version, then remove the secretRef, then edit the live resource on the cluster and see the The issue that is described here is surfacing for me seems to be reproduced in v0.24.0 and on, all suffer from the issue described here when installed on a fresh cluster. But for some reasons, couldn't be reproduced in every cluster. |
This is related to fluxcd/kustomize-controller#486 you can't delete fields added by kubectl. You need to clear the managed fields first.
|
The repro was done on a fresh K8s 1.23.1 cluster which has never been touched by kubectl - with only Flux 0.25.1 ever installed on it - I'll try it again today to confirm for sure, but this doesn't jibe with the managed fields explanation if it's supposed to have been fixed in 0.25 |
The managed fields issue is still opened, but we’ve updated the Kubernetes packages which could improve the situation. |
Wouldn't users only experience this bug if they were upgrading from an older version of Flux (earlier than SSA) to a newer one? I'm not understanding how I would have come into any fields managed by kubectl, when this resource was added by Flux and synced through the gitrepo on the latest version of Flux, and this cluster has never been touched by kubectl except for completely unrelated kubectl apply to add the I have been able to confirm again that K8s 1.23.1 with Flux 0.25.1, never operated on any earlier version, apparently has this issue and the managedFields patch approach to solving it offered above has no effect, as it appears the resource has no managedFields to patch and remove when it is opened with |
Can you post here the object with the managed fields when it gets in that state where removing the spec.secretRef from Git has no effect? |
There are no managed fields... (I pasted some managed fields a moment ago, I grabbed them from a different resource.) Not sure how this gitrepo came to be managed by Flux but without any managedFields Some other resources, like traefik-crds, do have managedFields set on them. But this resource for some reason does not. |
Here is the resource in context that comes up missing any (I have it deployed from the flux-system Kustomization)
|
I found the issue, what happens is that image-reflector-controller acts like kubectl when it adds finalizers and becomes the manager of all the fields inside |
To solve this issue we need to replace everywhere (in all our controllers) |
I can submit PRs with this simple change for all repos to close this issue, 🌮 That's a very obscure problem and I'm glad you could see it. I should be able to confirm pretty quickly if this change in source-controller fixes the issue I'm finding. |
Thank you, I'm able to confirm the new I'm going to be looking at: to understand whatever I need to know to tell users how they can get out of this pickle if they've encountered it already. |
It looks like fluxcd/pkg#209 and fluxcd/source-controller#542 together are not quite yet enough to liberate resources which were once "update"d to set their I deployed the We will need something else to tell users who hit this issue, or ideally something we can bake into the next release so they don't have to fix it on every given resource for themselves. (We are likely to get some questions from users after they merge this fix, however we can accomplish that, as latent changes that may have been stuck unapplied are resolved.) |
We could write a But first we need to get rid of |
I made some PRs to stop using Update when applying finalizers in Helm Controller and Kustomize Controller, mirroring your source-controller PR, which both appear to be the only controllers besides source-controller that do this for their finalizers. (Thanks for merging them 👍 ) Is it only finalizer |
All updates besides the one that finalizes a resource before deletion. |
Hey all, looks like this is fixed as of v0.26.0. Upgraded this morning and then removed the fields and everything worked as expected. Thanks @kingdonb and @stefanprodan! |
Great news! Thanks @jmriebold for confirming the fix. |
Describe the bug
We noticed this issue after updating to v0.25.1 and the issue is not currently effecting one of our other clusters that is on v0.24.1
when making changes to our image repository manifests we noticed that despite the reconciliation passing without issue the spec.secretRef field was not effected example.
Git Version
Cluster Version
Steps to reproduce
Expected behavior
I expect that when removing the spec.secretRef for the sync process to remove it on the cluster as well or error if there is a reason it cannot be edited/applied.
Screenshots and recordings
No response
OS / Distro
N/A
Flux version
v.0.25.1
Flux check
► checking prerequisites
✔ Kubernetes 1.21.5 >=1.19.0-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.15.0
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.19.0
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.15.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.19.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.20.1
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.20.1
✔ all checks passed
Git provider
gitlab
Container Registry provider
ECR
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: