Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Adding PodMonitor resources #2104

Closed
wants to merge 1 commit into from

Conversation

jtyr
Copy link
Contributor

@jtyr jtyr commented Jun 16, 2022

This PR is adding PodMonitor resources into the Helm chart so Gatekeeper can be automatically monitored by Prometheus Operator or similar Prometheus-compatible monitoring system like VictoriaMetrics. Gatekeeper metrics are required by the OPA Gatekeeper Grafana Dashboard (mentioned in #2070) used with opa-scodeboard, the Prometheus exporter for OPA Gatekeeper.

@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2022

Codecov Report

Merging #2104 (2679bf9) into master (d3e62d0) will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2104      +/-   ##
==========================================
+ Coverage   54.35%   54.38%   +0.03%     
==========================================
  Files         111      111              
  Lines        9478     9478              
==========================================
+ Hits         5152     5155       +3     
+ Misses       3935     3933       -2     
+ Partials      391      390       -1     
Flag Coverage Δ
unittests 54.38% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...onstrainttemplate/constrainttemplate_controller.go 54.91% <0.00%> (-0.24%) ⬇️
pkg/readiness/list.go 91.17% <0.00%> (+11.76%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3e62d0...2679bf9. Read the comment docs.

@sozercan
Copy link
Member

@jtyr thanks for the PR! since these are 3rd party project integrations, https://github.com/open-policy-agent/contrib might be a more suitable place. Please see #659 for more context.

@jtyr
Copy link
Contributor Author

jtyr commented Jun 17, 2022

@sozercan Thank you for you comment. I understand that you don't want to take responsibility for components that you don't develop. But Helm charts exist to provide user-friendly way of using your product. That should include support for industry-leading integrations. Gatekeeper exposes Prometheus metrics and Prometheus Operator is used everywhere where people are serious about monitoring and automation. By not supporting Prometheus Operator in your Helm chart, you are showing your power users that you don't care about them and force them to add the support over and over again by themselves.

@praveenjindal62
Copy link

@jtyr Why not use ServiceMonitor instead of PodMonitor ? Many popular charts I have seen are exposing metrics using ServiceMonitor and I guess this make more sense to look for service which is already exposing ports on Pods then to create another object which does some amount work that Service already doing (i.e. keep looking for new pods and create endpoints). Although PodMonitors make more sense where ServiceMonitor are not at all right options, for ex - if only some pods of a deployment or sts or daemonset need to be used to expose metrics. But that is not the case with gatekeeper. Also, I know that current helm charts not exposing metrics port on pod through service but those can be modified easily to do same. Let me know your thoughts?

@sozercan I completely agree with @jtyr comment. Helm charts are exposing metrics through a separate port but there is no provision to expose those metrics to Prometheus out of the box. I would also vote to add ServiceMonitor/PodMonitor to these charts irrespective of these being handled by third party which would increase the overall usability and developer experience as well. Third party objects can be added as disabled by default so users are not impacted.

@jtyr
Copy link
Contributor Author

jtyr commented Jun 21, 2022

@praveenjindal62 I have chosen to use PodMonitor as there is no Service for the Audit Controller and the Service for the Controller Manager doesn't expose the metrics port. At the end, ServiceMonitor and PodMonitor provides the same result. So unless there is a reason for the non-existing Service to be created and the existing Service to be modified with the metrics port, there is no reason to choose ServiceMonitor over PodMonitor.

@ritazh
Copy link
Member

ritazh commented Jun 22, 2022

Would creating another helm chart that references the Gatekeeper chart as a dependency help in this case? https://helm.sh/docs/helm/helm_dependency/ That way there's a separation of concerns while users benefit from the added features.

@jtyr
Copy link
Contributor Author

jtyr commented Jun 22, 2022

@ritazh That's exactly how we are doing it now because the PodMonitor resources are not part of the gatekeeper chart and that's exactly the reason why I have created this PR so we don't have to maintain an extra chart only to be able to monitor Gatekeeper. I can imagine plenty of users are doing the same thing and by accepting this PR, you will make all of them very happy.

@jtyr jtyr force-pushed the jtyr-podmon branch 2 times, most recently from b088352 to 5b91be4 Compare June 27, 2022 15:38
@sozercan
Copy link
Member

@jtyr since this can be a separate chart with a gk chart dependency, would maintaining this chart in https://github.com/open-policy-agent/contrib work? reasoning is these resources will change over time, and since there are no tests, it will eventually be broken since core maintainers are not familiar with these resources to be able to maintain these in the long term.

@jtyr
Copy link
Contributor Author

jtyr commented Jun 28, 2022

@sozercan Creating additional Helm chart would work but it brings the burden of updating the dependency version with every GK release. It would be much easier to have the support for PodMonitor built into the gatekeeper chart. The installation of PodMonitor resources is disabled by default so even if there would be an an issue with resource definition, most people won't notice it. On top of that, any breaking changes to the PodMonitor resource must include change of the apiVersion of the CRD. Those changes are advertised by a deprecation message for several Kubernetes release cycles before they are removed permanently. That gives plenty of time to rise a PR to reflect this change here.

@mrueg
Copy link
Contributor

mrueg commented Jun 28, 2022

From my personal experience these servicemonitor/podmonitor objects are pretty static and there's not a lot of change to be expected soon. They are also quite common in clusters with prometheus-operator enabled.
If it helps the conversation, they could stay disabled by default and the Readme could mention that they are "alpha" or "community" supported resources.

@maxsmythe
Copy link
Contributor

maxsmythe commented Jun 30, 2022

Ultimately this PR touches on a governance principle that is necessary to keep the project maintainable. As a rule, Gatekeeper does not introduce third-party infrastructure dependencies since doing so risks increasing the maintenance burden of the project over time, may lead to incompatibilities with other users' infrastructure, and opens up thorny questions of which projects "deserve" inclusion and which don't, a question we don't have the overhead to handle.

brings the burden of updating the dependency version with every GK release

is an excellent example of the complexity we're trying to avoid. Prometheus configurations are very stable, but there are other solutions, such as the OpenTelemetry collector. Should that also be supported? If not, why are Prometheus users more privileged? OpenTelemetry is a CNCF project, should that count for something? Now we'd need to track compatibility for 2 projects. We'd also probably need to bundle in cert-manager, so that's at least 3 projects, maybe more.

To provide a truly streamlined experience, we'd also need to support multiple permutations of how these projects can be used. Frankly we don't have the capacity, and since there are already solutions for this problem, a better investment of development time for our users would be to spend it working on bugs and/or features specific to the project.

You mention we serve Prometheus metrics, which is a fair point, but:

  • If we didn't serve metrics, users would be unable to use Prometheus with the project. Lack of OOTB config does not prevent usage.
  • We serve metrics via OpenCensus, which allows us to export metrics via several protocols
  • There is a difference between providing opportunities for interoperability between third-party projects and being opinionated as to how those third-party projects should be configured/deployed
  • We take a similar stance with cert-manager where we aim to be compatible, but not prescriptive in how it is deployed

We are definitely open to making interoperability easier where possible. This PR, which would have made it easier to use cert-manager with the Helm chart (which we couldn't merge due to the lack of DCO and the author no longer responding), is a good example of a neutral compromise -- it makes Gatekeeper more configurable in ways that cert-manager requires without assuming cert-manager.

I'm wondering if a similar approach could be employed here? Is there a need to make it easy to deploy additional resources alongside the chart?

@jtyr
Copy link
Contributor Author

jtyr commented Jul 5, 2022

I understand your reasons, @maxsmythe. I think that the solution could be to make the support for all those third-party resources (Porometheus, CertManager, ...) disabled by default and document that they are community driven so if something doesn't work, the community must fix it. Then if something stops working, the community will create a PR to fix it, so you can focus on your GK development only. I believe this would be better solution than providing no support at all and force many people to come up with their own solution.

@sozercan
Copy link
Member

sozercan commented Jul 5, 2022

@jtyr is there a reason you wouldn't want this to be part of https://github.com/open-policy-agent/contrib which are community-supported integrations? We can link to known community-supported integrations from GK docs so folks can find these.

@jtyr
Copy link
Contributor Author

jtyr commented Jul 7, 2022

I have explained reasons for that in one of my comments above. Basically there are two main reasons:

  1. The contrib repo doesn't work like Helm registry. We could make charts directory there and create GitHub Workflows to release charts from that directory using this action.
  2. Even if the contrib repo would work like Helm registry, the Helm chart that would wrap the GK Helm chart (have it as the dependency) would have to be updated with every time new GK Helm chart is released to keep the dependency version up-to-date. That's why it would be more practical to have thePodMonitor resources directly in the GK Helm chart.

@thomasmckay
Copy link
Contributor

2. Even if the `contrib` repo would work like Helm registry, the Helm chart that would wrap the GK Helm chart (have it as the dependency) would have to be updated with every time new GK Helm chart is released to keep the dependency version up-to-date. That's why it would be more practical to have the`PodMonitor` resources directly in the GK Helm chart.

This does not seem like a great burden and simply puts the responsibility to do this onto the community users. A GK release checkbox could be to open an issue against contrib to update the version.

@stale
Copy link

stale bot commented Sep 5, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 5, 2022
@stale stale bot closed this Sep 20, 2022
@JBOClara
Copy link

Could we reopen ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants