Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The tenant's StatefulSet replicates the Tenant's metadata #2318

Closed
aqeelat opened this issue Sep 15, 2024 · 8 comments · Fixed by #2337
Closed

[Bug] The tenant's StatefulSet replicates the Tenant's metadata #2318

aqeelat opened this issue Sep 15, 2024 · 8 comments · Fixed by #2337
Assignees

Comments

@aqeelat
Copy link

aqeelat commented Sep 15, 2024

When the operator syncs the tenant, it copies the labels and annotations from the Tenant to the StatefulSet, causing undesirable side effects.

Expected Behavior

Similar to how the Service metadata are synced, the labels and annotations that should be cascaded to the StatefulSet should be explicitly specified in the CRD.

Current Behavior

// Copy labels and annotations from the Tenant.Spec.Metadata
ssMeta.Labels = t.ObjectMeta.Labels
ssMeta.Annotations = t.ObjectMeta.Annotations

Possible Solution

  • Adding a section to the CRD called poolsMetadata for the labels and annotations that are shared across all pools.
  • Adding a boolean to the CRD called cascadeTenantMetadata that controls the lines in the snippet above.

Steps to Reproduce (for bugs)

  1. Create Tenant object with labels
  2. Watch the labels get cascaded to the StatefulSet

Context

I opened #2287 before investigating the issue but now I think I have more context to discuss the issue.

Because the operator cascades the argocd label to the stateful set, cilium thinks that this label is a security-relevant label, and uses it to create endpoints for the pods and the identity object that governs these endpoints. The issue appears because the identity object has the argocd label, which makes argo thinks that it is a top level object, but when it compares it with the actual application manifest, it does not find it there. Therefore, argo will think that this object was removed from the manifest and will try to remove it from the cluster as well.

We're using many other operators such as Percona, ECK, PGO, but we only see this behavior with the Minio Operator.
Once this issue is fixed, the operator can be declared GitOps friendly.

Regression

The issue first appeared in #295

Your Environment

  • Version used (minio-operator): 6.0.2
  • Environment name and version (e.g. kubernetes v1.17.2): GKE 1.29
  • Server type and version:
  • Operating System and version (uname -a):
  • Link to your deployment file:
@ramondeklein
Copy link
Contributor

AFAIK, there is no official statement whether labels and annotations should be propagated to child objects. Some people prefer this behavior (link), but a Kubernetes dev mentions that it's not desirable that labels propagate (link).

Not propagating labels and annotations would be an easy change and we should add a statefulsetMetadata field that will hold the annotations/labels for the generated statefulset. However this may break existing installations, so we can't make this change that easy. It would require a migration path:

  1. Add the preventMetadataPropagation field (defaults to false) to the Tenant CRD that will disable propagation of annotation/labels.
  2. Add the statefulsetMetadata field in the Tenant CRD that will add annotations/labels to the generated statefulset.

In the next major release of the operator the preventMetadataPropagation field can be removed, because we can only have these kind of breaking changes in major releases.

@aqeelat
Copy link
Author

aqeelat commented Oct 3, 2024

I agree with you on the solution. Let me know if there's anything I can do to help.

@cesnietor
Copy link
Contributor

cesnietor commented Oct 3, 2024

we do bunch of fetching and queries using labels as selectors for queries removing them will indeed break stuff.

@cesnietor
Copy link
Contributor

@ramondeklein @dvaldivia please discuss this together to figure out a proper solutions since this might break existing integrations.

@ramondeklein
Copy link
Contributor

Alternative fix, see #2337

@aqeelat
Copy link
Author

aqeelat commented Oct 15, 2024

@ramondeklein This is half a solution. In the new major version, this is how it will look like (with adding {} as the default value).
But a corrective action should be taken to prepare for it (i.e. the preventMetadataPropagation)

In other words, the propagation should be opt-out in v6 and opt-in in v7.

Still, I think your PR should be merged.

@aqeelat
Copy link
Author

aqeelat commented Oct 15, 2024

@ramondeklein Once your PR is merged, we could bring up the discussion on the path to make {} as the default value.

@ramondeklein
Copy link
Contributor

@aqeelat If we could start from scratch, then {} would be the default value. But we don't want to break existing installation from customers that upgrade the operator. This opt-out should work fine and I prefer doing this to a new property that opts-out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants