Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Allow/Expose options to enable distributed tracing in components as features are added upstream #725

Closed
wants to merge 2 commits into from

Conversation

sallyom
Copy link
Contributor

@sallyom sallyom commented Apr 7, 2021

Enhancement to add the option of distributed tracing in OpenShift components as tracing features are added upstream. Currently Kube-APIServer, Kubelet, CRI-O, and Etcd have experimental distributed tracing.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign dgoodwin after the PR has been reviewed.
You can assign the PR to them by writing /assign @dgoodwin in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sallyom sallyom force-pushed the opentelemetry branch 2 times, most recently from 81d43ee to cd23b82 Compare June 9, 2021 16:35
@sallyom sallyom changed the title WIP distributed tracing doc distributed tracing Jun 9, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2021
Copy link

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome @sallyom, excited to see tracing moving along 🙂

idk if it's of any help, but my previous enhancement proposal for this has some more use case details (at least for the scheduler). Feel free to use stuff from there if you need it #279

@sallyom
Copy link
Contributor Author

sallyom commented Aug 4, 2021

This looks awesome @sallyom, excited to see tracing moving along

idk if it's of any help, but my previous enhancement proposal for this has some more use case details (at least for the scheduler). Feel free to use stuff from there if you need it #279

Only getting back to this now, thanks, and sorry I had not seen/remembered this - mind if I add you as a co-author to this enhancement and allow you to add to it? You will have valuable insight as you've been working with tracing with the scheduler, also.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 30d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2021
@damemi
Copy link

damemi commented Sep 22, 2021

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2021
@sallyom sallyom force-pushed the opentelemetry branch 3 times, most recently from bf674af to 72adc2a Compare October 11, 2021 17:10

### Kube APIServer

* Add the TracingConfiguration file flag to OpenShift
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will create the configuration file, the new operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this proposal doesn't include a new operator. The tracing-config file with defaults will be added to kube-apiserver-operator manged manifests. It will be considered only if the APIServerTracing kube-apiserver feature-gate is enabled. By default, the sampling rate will be 0, so if enabled, only the spans from sampled requests will be emitted. This will keep the tracing overhead to a minimum. The sampling rate should be configurable.

- "@husky-parul"
- "@damemi"
reviewers:
- TBD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need reviewers from the API server, etcd, and node teams.

### Non-Goals

The OpenTelemetry Collector operator for OpenShift will not be part of core OpenShift. Instead, the operator is available
on the OperatorHub in the OpenShift console, or, can be deployed manually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which team will be maintaining that operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenTelemetry Collector Operator is already available in OperatorHub

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from Operatorhub there is also a packaged operator from RH.

to enable OpenTelemetry tracing in version 1.22. A KEP is under review and work is underway to instrument kubelet.
A POC has been created with kube-scheduler. With these components instrumented, it will be possible to view traces with
CRI-O <-> Kubelet <-> Kube-Apiserver <-> ETCD. At this point, there is much to gain in instrumenting
other components and extending the OpenTelemetry train to give a complete view of the system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which components need to be instrumented for a minimum viable version of tracing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal is to allow tracing in components that are instrumented upstream. No downstream instrumentation will be added to OpenShift components with an upstream counterpart. Instead, as tracing instrumentation is added and that version is brought into OpenShift, slight configuration changes might be required in order for OpenShift administrators to enable tracing (such as knowledge of a trace-config file with the kube APIServer, or addition of flags). Tracing is not and has no plans to be enabled by default in any component upstream or in OpenShift. Individual component owners can already add tracing instrumentation if they choose to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clarify that within the proposal, thanks!

@sallyom sallyom changed the title distributed tracing Allow options to enable distributed tracing in components as features are added upstream Oct 20, 2021
@sallyom sallyom changed the title Allow options to enable distributed tracing in components as features are added upstream Allow/Expose options to enable distributed tracing in components as features are added upstream Oct 20, 2021
Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 I think this is a really great idea!

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2021
@damemi
Copy link

damemi commented Nov 17, 2021

/remove-lifecycle stale

@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Dec 29, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 29, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sallyom
Copy link
Contributor Author

sallyom commented Feb 8, 2022

/remove-lifecycle stale

@sallyom
Copy link
Contributor Author

sallyom commented Mar 14, 2023

/reopen
/remove-lifecycle rotten

@openshift-ci openshift-ci bot reopened this Mar 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 14, 2023

@sallyom: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 14, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dtantsur for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 14, 2023

@sallyom: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 72adc2a link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sallyom sallyom changed the title Allow/Expose options to enable distributed tracing in components as features are added upstream [WIP] Allow/Expose options to enable distributed tracing in components as features are added upstream Mar 14, 2023
@sallyom sallyom marked this pull request as draft March 14, 2023 16:07
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 14, 2023
in the kubelet, as well as in the main branch of CRI-O. Tracing can be added as an option in the OpenShift
deployment configurations for these components but will require minor updates to deployment manifests.

In distributed systems tracing gives valuable data that logs and metrics cannot provide. This enhancement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In distributed systems tracing gives valuable data that logs and metrics cannot provide. This enhancement
In distributed systems, tracing not only provides valuable data that logs and metrics cannot. It also simplifies the correlation between the different signals. This enhancement

tracking events across service boundaries. Furthermore, tracing can shrink the time it takes to diagnose issues,
giving useful information and pinpointing problems without the need for extra code. Upstream, etcd has been
instrumented to export gRPC traces. CRI-O is also adding instrumentation. Kubernetes API server added the option
to enable OpenTelemetry tracing in version 1.22. A KEP is under review and work is underway to instrument kubelet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A KEP is under review and work is underway to instrument kubelet.

Generally coming in v1.27, right?

### Non-Goals

The OpenTelemetry Collector operator for OpenShift will not be part of core OpenShift. Instead, the operator is available
on the OperatorHub in the OpenShift console, or, can be deployed manually.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from Operatorhub there is also a packaged operator from RH.

Signed-off-by: Sally O'Malley <[email protected]>
Signed-off-by: Parul Singh <[email protected]>

Co-authored-by: husky-parul <[email protected]>
Co-authored-by: damemi <[email protected]>
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2023
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 19, 2023
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Apr 27, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 27, 2023

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. It does not appear to be linked to a valid Jira ticket. Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot ignore it in the future.

1 similar comment
@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. It does not appear to be linked to a valid Jira ticket. Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. It does not appear to be linked to a valid Jira ticket. Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann dhellmann removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants