Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to use pprof which was introduced in controller-runtime v0.15.x #3338

Closed
3 tasks
zqzten opened this issue Apr 17, 2023 · 10 comments
Closed
3 tasks
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@zqzten
Copy link
Member

zqzten commented Apr 17, 2023

What do you want to happen?

As pprof support been introduced to controller-runtime, we can follow up to make it available (and visible) to end users of kubebuilder. This issue tracks the tasks needed TBD, please comment if I miss or get wrong on anything.

  • upgrade controller-runtime to v0.15.x
  • add a flag to configure the pprof serving address in main.go of kubebuilder scaffold
  • address the security issues (possible sensitive information leak) on exposing pprof in comments or the kubebuilder book

For the third task, I'm not sure what best practice we shall provide to end users so this is RFC.

For now, I can think up two possible ways:

  • defaulting the pprof bind address to 127.0.0.1:xxxx
  • protect the pprof endpoint by kube-rbac-proxy just like the metrics does (this would require more tasks TBD)

Please comment your preference or other opinions if applicable, thanks!

Extra Labels

/kind documentation

@zqzten zqzten added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 17, 2023
@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Apr 17, 2023
@camilamacedo86

This comment was marked as resolved.

@camilamacedo86 camilamacedo86 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels May 4, 2023
@zqzten

This comment was marked as resolved.

@camilamacedo86
Copy link
Member

camilamacedo86 commented May 10, 2023

HI @zqzten,

It is great that you want to help users and Operator authors with this feature.

The primary reason for not enabling pprof in production (or in default scaffolds) revolves around two key concerns: Security and Performance.

  • Security: The pprof endpoints can expose potentially sensitive runtime details, such as the number of goroutines, the memory usage, CPU usage, and more. This could provide a vector for malicious parties to gather information about the application and potentially exploit it. By enabling pprof by default, you could inadvertently expose these endpoints to the public, posing a serious security risk.

  • Performance: While pprof is an excellent tool for diagnosing performance issues, it is not without cost. Profiling can consume considerable CPU and memory resources, especially when continuously enabled in a production environment. This could lead to performance degradation and impact the scalability of your application.

These concerns are less pronounced in a development environment, and the benefits of having pprof enabled (e.g., easier debugging, performance tuning) may outweigh the potential drawbacks. However, in a production setting, the risks usually outweigh the benefits.

However, we cannot add it to the default scaffolds done by Kubebuilder because it is not recommended for production and would not be a good practice. If we provided it in the default scaffolds, authors would need to remove or disable the feature prior to adding it on production, which in reality would not happen and would be a big ask to do so.

Following the comments inline.

On the contrary, pprof is a must have when we want to inspect performance problems such as memory leak in production, as kubernetes-sigs/controller-runtime#1943 (comment)

In the controller-runtime library, which underpins Kubebuilder, pprof debugging endpoints are not enabled by default. This is an intentional design decision, made to prioritize security and performance for users, particularly those who are new to Kubebuilder. Experienced developers and operators who understand the implications and need to use pprof for performance tuning or debugging can choose to enable it explicitly. This is a stark contrast to including it in the default Kubebuilder scaffolds, which primarily cater to those who are just starting out and may not fully understand the implications of enabling pprof by default.

Also, as an evidence, k8s components such as kube-controller-manager also enables pprof by default.

You're correct. kube-controller-manager, which is a part of the Kubernetes control plane, does enable pprof endpoints by default. It's important to understand why and under what circumstances it might be considered acceptable.

As part of the Kubernetes control plane, the kube-controller-manager runs on the master nodes of a Kubernetes cluster. The master nodes are not typically exposed to the public internet, and they should be protected by various security measures. This means that, in a correctly configured cluster, the pprof endpoints of the kube-controller-manager should not be accessible to untrusted entities.

Enabling pprof by default on the kube-controller-manager allows Kubernetes developers and administrators to debug performance issues directly on running clusters. This can be very helpful for diagnosing problems in a live environment. However, it is still a potential security risk if an attacker were able to gain access to the master nodes, and I believe that in many environments, system admins will disable that if/when the env is not in a protected or disconnect network (airgap envs)

Conclusion: IMO:

That being said, it's not that you can't enable pprof in a Kubebuilder scaffold; it's just not recommended for production use, hence not included by default. In controlled circumstances, such as a secure and isolated environment, with appropriate monitoring and precautions, you could consider enabling pprof for specific profiling tasks. But such use should be temporary and closely supervised.

Asking DevOps, who uses kubebuilder, to be aware that the default scaffolds expose their security this way is inappropriate and would be a big ask.

Options to add the feature on Kubebuilder:

Allowing the scaffolds via an optional plugin

That could fit in an optional plugin. But, the code implementation to achieve this goal shows to be very small, and the effort to maintain the plugin would not justify what it could bring alone. However, if you have a proposed plugin solution that involves the pproff and other helpers' features that justify that it also might be a great fit Please, see the plugins section and check that we have optional plugins which can be used to add specific scaffolds to do the project.

Provide guidance and tutorials:

It would be great if we have a doc with the whole guide on how to implement its usage, what is possible to achieve with it and how.

Alternative Option to provide the scaffold/code:

It might be acceptable if you have any proposed achievable idea that is not enabling it in the default scaffolds. Anyway, the code also should be shipped with documentation explaining how and when to use it.

I hope that makes sense.
Also, please feel free to contribute within PRs and propose solutions that fit and are aligned with the above points.

@zqzten

This comment was marked as resolved.

@camilamacedo86

This comment was marked as resolved.

@camilamacedo86 camilamacedo86 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Aug 17, 2023
@camilamacedo86 camilamacedo86 changed the title [Umbrella & RFC] pprof support Document how to use pprof which was introduced in controller-runtime v0.15.x Sep 16, 2023
@camilamacedo86
Copy link
Member

The comments above were hidden for simplicity. While we've decided to document its usage rather than include it in the default scaffold, you can refer to the hidden comments for more context on this decision. Nevertheless, introducing it as an optional plugin for the scaffold could be a viable alternative. This option would certainly be a beneficial addition.

To address this issue we must:

  • Create a doc under the reference to let users know how to use this option
  • Optionally we could address it via an Optional Plugin to update the default scaffold within this option. (I would recommend we doc first and then work in the PR with the proposal solution for it as a plugin as follow up)

@TAM360
Copy link
Contributor

TAM360 commented Aug 23, 2024

/assign

@TAM360
Copy link
Contributor

TAM360 commented Aug 24, 2024

@camilamacedo86 Won't our documentation vary based on if and how we incorporate the support for pprof in Kubebuilder?

@camilamacedo86
Copy link
Member

camilamacedo86 commented Aug 24, 2024

It is supported since it is a feature in the controller-runtime.
The idea here is just add a doc under references to let people know how to use and when to use as its implications.
We cannot add it in the default scaffold because it is not recommended for prod envs, see;

#3338 (comment)
(I re-open it just to share why we cannot add in the default scaffold)

So, in the doc we need to also share this info to allow people aware of.

@camilamacedo86
Copy link
Member

It is done now !
Closing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants