Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a separate crds helm chart #677

Open
andrewdinunzio opened this issue Mar 9, 2023 · 29 comments
Open

Create a separate crds helm chart #677

andrewdinunzio opened this issue Mar 9, 2023 · 29 comments
Labels
chart:operator Issue related to opentelemetry-operator helm chart chart proposal

Comments

@andrewdinunzio
Copy link

andrewdinunzio commented Mar 9, 2023

This issue shifted focus. This is now an issue for creating a dedicated helm chart for opentelemetry CRDs.

@andrewdinunzio andrewdinunzio changed the title Consider adding a clusterrole for custom resources with aggregate-to-view Consider adding a clusterrole for custom resources with aggregate-to-{view/edit} Mar 9, 2023
@TylerHelmuth TylerHelmuth added chart:operator Issue related to opentelemetry-operator helm chart chart proposal labels Mar 13, 2023
@Allex1
Copy link
Contributor

Allex1 commented Mar 14, 2023

@andrewdinunzio it seems your issue has 2 proposals in it :

  1. Make a separate chart just for cr's which is something that makes sense to me, would you be able to work on that ?

  2. Add a read only cluster role for the cr's so that everyone can at least see their configs
    To me this does not really makes sense as Kubernetes clusters vary wildly between companies/vendors. Imagine single tenant vs multi tenant k8s clusters, where even ro for all users at the cluster level is undesirable. Maintaining all these use cases would require a lot of time.
    Instead, we provide flags to disable the existing role/clusterrole(which include all rules you may need) and add your custom rbac by sub-charting this in your env.

@andrewdinunzio
Copy link
Author

andrewdinunzio commented Mar 17, 2023

I don't have much experience with multi-tenant clusters to know much about that but I figured maybe in that case that 1) they would probably be isolated by namespace and within their tenant namespace they would bind a normal namespaced Role to the view ClusterRole just for that namespace. Then they'd have view permissions in their namespace only. And 2) I figured if someone can already view Deployments, Services, etc. (i.e. they have the view role) I didn't see why they wouldn't also be able to see OTel custom resources too.

W.r.t. a crds chart, I can certainly help with that when I find time, though I never used operator sdk before so will need to familiarize myself with that and how it could be automated with the releases, etc.

Also would need to see what everyone thinks with respect to versioning a CRDs chart. We could follow something like linkerd which follows proper semver for their CRDs chart as its own entity, or we could do what I'd prefer and pin the crds chart releases (and versions) with the main app chart. The only reason the CRDs need their own chart is because helm handles CRDs badly, but otherwise they would probably be packaged alongside the operator that, well, operates on them. I prefer this approach because if the versions are pinned, there would be no question as to which CRDs chart version to install (and it'd be super easy for pipelines to handle; just install the same version CRDs chart as the app chart)

@Allex1
Copy link
Contributor

Allex1 commented Mar 20, 2023

they would probably be isolated by namespace and within their tenant namespace they would bind a normal namespaced Role to the view ClusterRole just for that namespace.

I see a few use cases:

  1. you create a rolebinding in your own ns on the provided clusterrole and you get all permissions but just in your ns.
  2. you disable the provided clusterrole and create your own with more/less permissions and you bind it at cluster/ns level.
  3. different user types that can access the same ns in which your proposal make sense but I still think it should be treated by the cluster administrator and not in the main chart. Most charts are assuming that the user has full control in order to tune settings for his environment.

or we could do what I'd prefer and pin the crds chart releases (and versions) with the main app chart

If I'm reading this correct you're proposing that the crd chart to be a sub chart of he operator chart which makes sense to me as well.

@andrewdinunzio
Copy link
Author

andrewdinunzio commented Mar 20, 2023

I see, so if I understand your point correctly, having an aggregate-to-view/aggregate-to-edit could be a useful addition but you would want it to be disabled by default? Or are you saying that if we were to go this route, we might as well just make use of the ClusterRoles already present in the chart?

or we could do what I'd prefer and pin the crds chart releases (and versions) with the main app chart

If I'm reading this correct you're proposing that the crd chart to be a sub chart of he operator chart which makes sense to me as well.

Not a subchart, since there are still potential issues with that. Helm aggregates all resources from dependencies and the "main" chart together before applying anything, and then it applies it in order based on Kind. Unfortunately, there is an indeterminate amount of time between when CRDs are applied and when they can be used to create/update resources of that type. It seems like this is due to a stale cache, which helm v3 fixed by adding the /crds folder to manage CRDs. This invalidates the cache, but there are other issues with that, like not being able to add new CRDs as part of an upgrade. So that means not only is your list of CRDs permanently static, but also, if someone else were to use this chart as a dependency (as part of an upgrade where they previously did not have this dependency), then none of the CRDs would be installed.

I went down the rabbit hole of looking into helm hooks as possible solutions too, but every approach seems to have issues, except having a separate chart dedicated to CRDs and installed separately (not as a dependency). By pinning the crds chart releases and versions to the main app chart, I simply mean that the CRDs chart would get a version bump even if nothing in it changed, just so that it always maintains parity with the main app chart, since essentially we are installing one "component".

The only scenario I can see where this version pinning would be a bad thing would be if multiple different charts had dependencies on the OTel CRDs, then it would be tougher for the non-operator chart to know which version to use.

It could be argued that that is more of a reason to use semver for the CRDs charts, but I'm not sure... there's a few points I think that factor into this:

  • helm is not a package manager. It doesn't know how to manage cases where multiple charts have the same dependency
  • Charts also can't reference the crds chart as dependencies anyway due to the issues mentioned above.
  • Since helm doesn't enforce safe shared dependencies, it is difficult for multiple charts to coordinate and find a common compatible crd chart version
    • Just get the latest semver-compatible CRD chart version, install that, then everything should be fine
  • But what happens during a breaking change?
    • All dependencies would have to upgrade to not use any deprecated/removed APIs before upgrading the CRDs chart
    • If v1.16.3 adds my.crd/v2 (and deprecates my.crd/v1) and v2.0.0 of my crds chart removes my.crd/v1, I need to find all charts that are still using resources from my.crd/v1. Unfortunately there's no way to get that from semver.
    • Since the crds chart isn't in the "dependencies" of charts that depend on it, we'd have to manually search for all the old/deprecated versions, see which chart(s) they belong to, upgrade each chart so it only uses CRDs in the latest compatible crds chart (note I did not say "upgrade to the latest compatible crds chart")
    • Only THEN can we upgrade to the breaking change version of the CRDs chart.
  • This workflow of finding charts using old CRD API versions, upgrading them so that they use newer ones, then upgrading the CRDs chart is equivalent to what we have to do with version pinning anyway.
    • Upgrading the operator would still take the "lead" in managing the CRD versions
  • There won't ever be a breaking CRD chart change without a corresponding breaking operator chart change
    • The only breaking changes that CRDs charts should make are for removing deprecated API versions. Breaking changes to alpha versions are expected.
    • Such a breaking change in the CRDs chart would be a breaking change in the main app chart (every time we bump the crds chart major version, we have to bump the main app chart major version)
    • Since the CRDs are the API for the operator, I can't see a scenario where there would be a breaking change in the operator chart that users of CRDs would NOT be interested in.
      • Edit* I guess it's possible for there to be a config/env change for the operator itself that doesn't impact the lifecycle of CRs, but I can't think of a case where such breaking changes would be necessary
      • Even if it was necessary, it doesn't change the fact that if there was a breaking change in the operator, users of the CRDs still need to make sure it's still compatible.
    • Therefore, charts that depend on CRD charts should also be concerned with breaking changes for the operator chart.
    • Essentially, this would mean the major versions of the operator chart and major versions of the CRDs chart would be pinned anyway.

I don't know if this makes sense, but that's my 2c

Edit* Counterpoint to the above: CRDs are meant to only specify an API anyway and not an implementation. It's possible that one could install OpenTelemetry CRDs but use a different implementation for managing those custom resources. I can't think of anything that does that except maybe the Gateway API, which is now a built-in resource with K8s. Err actually VictoriaMetrics does this with the Prometheus CRDs too... Idk, maybe semver for the CRDs chart is actually a good idea, and k8s just needs more tooling in the "package management" space.

@andrewdinunzio andrewdinunzio changed the title Consider adding a clusterrole for custom resources with aggregate-to-{view/edit} Create a separate crds helm chart Mar 20, 2023
@andrewdinunzio
Copy link
Author

Renaming this issue to focus on the CRDs helm chart, and I can make a separate issue for the other if desired.

@Allex1
Copy link
Contributor

Allex1 commented Mar 21, 2023

I see, so if I understand your point correctly, having an aggregate-to-view/aggregate-to-edit could be a useful addition but you would want it to be disabled by default? Or are you saying that if we were to go this route, we might as well just make use of the ClusterRoles already present in the chart?

I'm actually saying that the current clusterrole/binding and the knobs to turn it on/off should be enough for any user to achieve his goals. The aggregate-to-view/aggregate-to-edit are useful but they should be done on the user side as they are custom to their needs/use cases. This is usually done by creating an internal chart which sub-charts the upstream(which assumes you are cluster admin) and adds custom configs on top.

On the crds part you really went down the rabbit hole(thanks for that) and my opinion might disappoint you :) but here goes:

After reading all of your research I'm inclined to avoid creating a separate crds chart and follow the kube-prometheus-stack model basically pinning the helm version to operator/crds version manually.

You either include the crds when installing the chart for the first time (default helm behaviour) or sideload them trough a different process which deploys cluster level resources (ci/cd or any other scripts) and pass --skip-crds when deploying the operator chart.

For upgrading safely there is a step that manually upgrades the crds before major/breaking helm chart changes.

@povilasv
Copy link
Contributor

I agree with @Allex1 on this one too, I would avoid CRD chart given described helm issues.

@andrewdinunzio
Copy link
Author

andrewdinunzio commented Mar 24, 2023

I should make it clear that having the dedicated crds chart does fix the helm issues mentioned above, provided it is installed separately before the main chart.

But I'm on board with otel following the kube-prometheus-stack approach. It does make using common templates in a pipeline a bit more difficult, but as long as all CRDs are included in all releases, even if they don't change, then the pipelines can just apply the entire folder and the CRDs would be in the expected state.

For kp, I had to make a script that inspects the helm chart, pulls out the appVersion from Chart.yaml, then downloads all the CRDs for that app version (substituting that app version in the URLs in the README), and I can do something similar for this.

@Allex1
Copy link
Contributor

Allex1 commented Feb 9, 2024

Meanwhile kps separated the crds in the a sub-chart

@andrewdinunzio
Copy link
Author

andrewdinunzio commented Mar 26, 2024

Ohh, they take advantage of the cache invalidation provided by the crds directory in a subchart, and I guess by putting this into the charts directory as a subchart, this somehow gets around the inability to add new CRDs / new versions of CRDs as well?

Edit* just tested, still can't add new CRDs if using the crds dir, even copying the way kps does it.

@jaronoff97
Copy link
Contributor

So we're running into a flavor of this right now with #1164. We need to be able to upgrade the CRD to contain the new conversion webhook, but that has an expectation that the operator is available in a specific namespace.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    cert-manager.io/inject-ca-from: opentelemetry-operator-system/opentelemetry-operator-serving-cert
    controller-gen.kubebuilder.io/version: v0.14.0
  labels:
    app.kubernetes.io/name: opentelemetry-operator
  name: opentelemetrycollectors.opentelemetry.io
spec:
  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        service:
          name: opentelemetry-operator-webhook-service
          namespace: opentelemetry-operator-system
          path: /convert
      conversionReviewVersions:
      - v1alpha1
      - v1beta1

Note the service name, namespace here

As a result of this, I propose that we create a new chart that contains the CRDs within the templates/ folder. This would accomplish two things:

  1. Upgrades for the CRDs will be handled automatically
  2. The conversion webhook in the CRD can reference the name and namespace of the operator installation

Although this is a bit different than what other charts do today, this would allow for a much simpler installation overall. The operator chart would then have the CRD chart as a dependency. I would be interested in hearing potential concerns with this approach. Those concerns are explained well in the original helm proposal.

We should make this explicit for the chart:

If there was a way to require a one-to-one relationship between a controller and a CRD, upgrading would not be problematic. If it were the case that a CRD must have exactly one controller, and that the lifecycle of the two was tightly coupled, upgrade would be trivial.

There may be some cases where a user wants more than a single otel operator (multi-tenant installation where WATCH_NAMESPACE is set for each operator installation) in which case that user would probably install three charts – the CRDs, the first operator, and the second operator. In the operator installations the user would disable the CRD installation by flipping a value controlling the dependencies.

@jaronoff97
Copy link
Contributor

jaronoff97 commented May 2, 2024

FWIW, we do this extensively internally at my current role. It also appears that the following groups does this too:

Use this link for more

@TylerHelmuth
Copy link
Member

@open-telemetry/helm-approvers please review as this is currently blocking the operator chart upgrade.

I am most curious about the Istio solution of optionally including the CRDs in a template file via .Files. This would prevent us from needing to make and maintain another chart.

@andrewdinunzio
Copy link
Author

andrewdinunzio commented May 2, 2024

Putting crds in the templates causes a race condition where on the first installation, it might try to create CRs before the CRDs are created (due to the stale cache issue linked above that was "fixed" by the crds folder). On upgrades the stale cache can cause similar issues. Unless you mean putting the crds in a configmap or something and applying them in a helm hook job like Istio used to do, but they switched away from that for some reason, I think.

I guess it's not an issue if you don't install any CRs but if that ever becomes the case it would be an issue. It would also be an issue for anyone trying to add this chart as a subchart/dependency where they add the CRs as templates.

@jaronoff97
Copy link
Contributor

The CRDs should always be created first no? This is the sort order for helm

@TylerHelmuth
Copy link
Member

There are no plans to allow the operator chart to install CRs, that will be the work of the opentelemetry-kube-stack chart.

@TylerHelmuth
Copy link
Member

Also we'll need to think about how these changes will affect that chart

@andrewdinunzio
Copy link
Author

andrewdinunzio commented May 2, 2024

The CRDs are created first but due to the stale discovery cache it still throws an error while creating the CRs sometimes.

We add the otel operator chart as a dependency and create CRs in our own chart, so it would affect us 😅
But we have a workaround anyway where we just pull all the CRDs and install them separately first anyway.

@jaronoff97
Copy link
Contributor

@TylerHelmuth either way for the opentelemetry-kube-stack i was going to do this anyway and fail open. The CRDs should be installed first and I can test that there.

@jaronoff97
Copy link
Contributor

Okay I've looked into this a bit more and I do believe having an isolated helm chart for the CRDs is the way to go here. Given the precedence I linked above has a few operators that embed the CRDs in the templates, there are also many that simply embed the CRDs in their own chart:

There are a few benefits to having this separate chart:

  • Decrease release size – CRDs would not be included in the helm release but will still be installed on initial install
  • Ensure correct order of installation (right now there could technically be a race!)
  • Follow the convention set by helm

I tested this locally by using the subchart in the opentelemetry-kube-stack's dependencies this resulted in a clean and quick installation without needing any other work.

As far as the plan goes for the operator's chart to use this:

  1. merge the isolated chart
  2. use the isolated chart as a subchart for the kube-stack chart
  3. Wait a few releases until v1alpha1 support is dropped
  4. embed the crds chart in the operator chart the same way that is done for the kube-stack chart (doing this wouldn't break existing as we have the helm DO NOT DELETE policy on the CRDs)

Does this seem reasonable? cc @swiatekm-sumo @TylerHelmuth @open-telemetry/helm-approvers

@TylerHelmuth
Copy link
Member

@jaronoff97 that seems reasonable, but I'm confused about the comparison to kube-prometheus-stack. It looks like kube-prometheus-stack made a subchart within kube-prometheus-stack so that others cant use it directly.

Are you proposing a standalone CRDs chart so that both opentelemetry-operator AND opentelemetry-kube-stack chart can use them?

@jaronoff97
Copy link
Contributor

yes thats what i was recommending here... there's maybe a case to be made for all of these to exist only within the kube-stack chart but that seems like a challenging deprecation to me. I wanted a single crds definition spot that can be used in multiple charts owned by us.

@TylerHelmuth
Copy link
Member

If we follow the pattern set by in kube-prometheus-stack we'd eventually deprecate the operator chart in favor of only opentelemetry-kube-stack. If that ends up being the case, we wouldnt need a standalone CRDs chart anymore, we could do the nested chart technique that kube-prometheus-stack uses.

I wonder is better:

  1. creating a standalone CRDs chart (that might get deprecated in the future) that both the operator and opentelemetru-kube-stack charts use
  2. Do the nesting technique in opentelemetru-kube-stack and then either do nothing in the operator chart, continuing to use the template technique we introduced, or move to a nested strategy there as well.

Wait a few releases until v1alpha1 support is dropped

Do we have an expected roadmap for how long that'll be?

@jaronoff97
Copy link
Contributor

I would be fine doing 2., it feels like the best of these options for the short term without locking us into anything in the long term. I'm not sure how many releases until v1alpha1 support is dropped, we should discuss at sig next week. FWIW 2. is what I have in #1166 right now and it works well :) cc @swiatekm-sumo what do you think about these options?

@swiatekm
Copy link
Contributor

swiatekm commented Jun 4, 2024

Does the nested chart technique let us do templating on the CRDs? If no, then I'd prefer to stay with the current approach in the operator chart.

I'm in favor of doing 1, because we have some consumers of the operator chart who want to depend on it while also shipping otel CRs, and it would make life easier for them if they could depend on a crd chart instead of shipping the crds on their own.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Jun 4, 2024

I'm in favor of doing 1, because we have some consumers of the operator chart who want to depend on it while also shipping otel CRs, and it would make life easier for them if they could depend on a crd chart instead of shipping the crds on their own.

@swiatekm-sumo option 2 would also support this I believe - you could configure the values.yaml of opentelemetry-kube-stack to install nothing but the CRDs, and then separately install the opentelemetry-operator chart with crds.create disabled.

@swiatekm
Copy link
Contributor

swiatekm commented Jun 6, 2024

I'm in favor of doing 1, because we have some consumers of the operator chart who want to depend on it while also shipping otel CRs, and it would make life easier for them if they could depend on a crd chart instead of shipping the crds on their own.

@swiatekm-sumo option 2 would also support this I believe - you could configure the values.yaml of opentelemetry-kube-stack to install nothing but the CRDs, and then separately install the opentelemetry-operator chart with crds.create disabled.

That's a bit convoluted, but the use case is inherently advanced, so I'm ok with it.

@kristeey
Copy link

kristeey commented Sep 3, 2024

Is there any progress in this discussion about the direction this is going? I use the opentelemetry-operator chart as a subchart, and create OpentelemetryCollector resources as part of the parent chart. As of now we are fine with using the opentelemetry-kube-stack to install the CRDs, but it feels a bit schitzo.
Is it recommended to switch to the opentelemetry-kube-stack all together instead of using the opentelemetry-operator chart and create otel resources separately ?

@TylerHelmuth
Copy link
Member

Is it recommended to switch to the opentelemetry-kube-stack all together instead of using the opentelemetry-operator chart and create otel resources separately ?

That is the direction we're headed in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chart:operator Issue related to opentelemetry-operator helm chart chart proposal
Projects
None yet
Development

No branches or pull requests

7 participants