-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Allow/Expose options to enable distributed tracing in components as features are added upstream #725
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
81d43ee
to
cd23b82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only getting back to this now, thanks, and sorry I had not seen/remembered this - mind if I add you as a co-author to this enhancement and allow you to add to it? You will have valuable insight as you've been working with tracing with the scheduler, also. |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
bf674af
to
72adc2a
Compare
|
||
### Kube APIServer | ||
|
||
* Add the TracingConfiguration file flag to OpenShift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will create the configuration file, the new operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this proposal doesn't include a new operator. The tracing-config file with defaults will be added to kube-apiserver-operator manged manifests. It will be considered only if the APIServerTracing kube-apiserver feature-gate is enabled. By default, the sampling rate will be 0, so if enabled, only the spans from sampled requests will be emitted. This will keep the tracing overhead to a minimum. The sampling rate should be configurable.
- "@husky-parul" | ||
- "@damemi" | ||
reviewers: | ||
- TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need reviewers from the API server, etcd, and node teams.
### Non-Goals | ||
|
||
The OpenTelemetry Collector operator for OpenShift will not be part of core OpenShift. Instead, the operator is available | ||
on the OperatorHub in the OpenShift console, or, can be deployed manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which team will be maintaining that operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OpenTelemetry Collector Operator is already available in OperatorHub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from Operatorhub there is also a packaged operator from RH.
to enable OpenTelemetry tracing in version 1.22. A KEP is under review and work is underway to instrument kubelet. | ||
A POC has been created with kube-scheduler. With these components instrumented, it will be possible to view traces with | ||
CRI-O <-> Kubelet <-> Kube-Apiserver <-> ETCD. At this point, there is much to gain in instrumenting | ||
other components and extending the OpenTelemetry train to give a complete view of the system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which components need to be instrumented for a minimum viable version of tracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal is to allow tracing in components that are instrumented upstream. No downstream instrumentation will be added to OpenShift components with an upstream counterpart. Instead, as tracing instrumentation is added and that version is brought into OpenShift, slight configuration changes might be required in order for OpenShift administrators to enable tracing (such as knowledge of a trace-config file with the kube APIServer, or addition of flags). Tracing is not and has no plans to be enabled by default in any component upstream or in OpenShift. Individual component owners can already add tracing instrumentation if they choose to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will clarify that within the proposal, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 I think this is a really great idea!
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle stale |
/reopen |
@sallyom: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@sallyom: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
in the kubelet, as well as in the main branch of CRI-O. Tracing can be added as an option in the OpenShift | ||
deployment configurations for these components but will require minor updates to deployment manifests. | ||
|
||
In distributed systems tracing gives valuable data that logs and metrics cannot provide. This enhancement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In distributed systems tracing gives valuable data that logs and metrics cannot provide. This enhancement | |
In distributed systems, tracing not only provides valuable data that logs and metrics cannot. It also simplifies the correlation between the different signals. This enhancement |
tracking events across service boundaries. Furthermore, tracing can shrink the time it takes to diagnose issues, | ||
giving useful information and pinpointing problems without the need for extra code. Upstream, etcd has been | ||
instrumented to export gRPC traces. CRI-O is also adding instrumentation. Kubernetes API server added the option | ||
to enable OpenTelemetry tracing in version 1.22. A KEP is under review and work is underway to instrument kubelet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A KEP is under review and work is underway to instrument kubelet.
Generally coming in v1.27, right?
### Non-Goals | ||
|
||
The OpenTelemetry Collector operator for OpenShift will not be part of core OpenShift. Instead, the operator is available | ||
on the OperatorHub in the OpenShift console, or, can be deployed manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from Operatorhub there is also a packaged operator from RH.
Signed-off-by: Sally O'Malley <[email protected]> Signed-off-by: Parul Singh <[email protected]> Co-authored-by: husky-parul <[email protected]> Co-authored-by: damemi <[email protected]>
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
(automated message) This pull request is closed with lifecycle/rotten. It does not appear to be linked to a valid Jira ticket. Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot ignore it in the future. |
1 similar comment
(automated message) This pull request is closed with lifecycle/rotten. It does not appear to be linked to a valid Jira ticket. Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. It does not appear to be linked to a valid Jira ticket. Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
Enhancement to add the option of distributed tracing in OpenShift components as tracing features are added upstream. Currently Kube-APIServer, Kubelet, CRI-O, and Etcd have experimental distributed tracing.