Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally supply tenancy when reporting spans v2 #3750

Closed
wants to merge 3 commits into from

Conversation

esnible
Copy link
Contributor

@esnible esnible commented Jun 9, 2022

Replacement for #3742

This PR attempts to address #3742 (comment)

I am creating this as a draft, for direction, to make sure I understand the requirement before doing all of the necessary work.

@codecov
Copy link

codecov bot commented Jun 9, 2022

Codecov Report

Merging #3750 (acbfdf3) into main (1ed6491) will decrease coverage by 0.04%.
The diff coverage is 90.16%.

@@            Coverage Diff             @@
##             main    #3750      +/-   ##
==========================================
- Coverage   97.49%   97.45%   -0.05%     
==========================================
  Files         271      271              
  Lines       16004    16056      +52     
==========================================
+ Hits        15603    15647      +44     
- Misses        317      323       +6     
- Partials       84       86       +2     
Impacted Files Coverage Δ
cmd/agent/app/reporter/grpc/builder.go 95.16% <ø> (ø)
cmd/query/app/server.go 94.44% <ø> (ø)
model/converter/thrift/zipkin/to_domain.go 100.00% <ø> (ø)
pkg/config/tenancy/tenancy.go 100.00% <ø> (ø)
cmd/agent/app/reporter/grpc/reporter.go 92.77% <85.71%> (-7.23%) ⬇️
cmd/agent/app/proxy_builders.go 100.00% <100.00%> (ø)
cmd/agent/app/reporter/grpc/collector_proxy.go 100.00% <100.00%> (ø)
cmd/agent/app/reporter/grpc/flags.go 100.00% <100.00%> (ø)
cmd/query/app/http_handler.go 100.00% <100.00%> (ø)
pkg/config/tlscfg/cert_watcher.go 92.55% <0.00%> (-2.13%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ed6491...acbfdf3. Read the comment docs.

@yurishkuro
Copy link
Member

Could you describe what solution specifically you're implementing? There seems to be a mix of things, but the main issue with agent being multi-tenant does not seem to be addressed.

@esnible
Copy link
Contributor Author

esnible commented Jun 9, 2022

I'd like to operate a cloud-based service that consumes OpenTelemetry traces and provides a Jaeger UI and gRPC query API.

I have a reverse proxy that implements single-sign-on and adds a header, x-tenant, when talking to the OTel collector and to Jaeger query.

My goal is to allow different groups of users to interact with the same Jaeger and see different data.

I did a proof-of-concept at #3605 with a multi-tenant memory storage and partial implementation of multi-tenant collecting and query. I am now trying to complete that work correctly, in small PRs.

After #3688 the tenant from the PostSpans header is passed along to storage. While working on a similar PR to pass the header from the query API to the spanstore I noticed Jaeger's logs reporting that if multitenant collection was enabled the opentracing middleware spans for query were being rejected because they lacked tenancy.

I believe the self-traces from Jaeger's HTTP server "belong" to the service operator, not the service user. This is why I created #3742 , to supply a service-operator tenant for those spans.

You correctly pointed out that I wasn't thinking about reporting in general. I only thought of correcting the span dropping introduced by my tenant checking on PostSpans. It does make sense to let the developer choose if reporting is done based on the active tenant or an "internal" tenant.

I've spent a day looking at the code and creating draft PRs. I tried a hack where I encode the tenant as a tag.

A clean solution will require either a multitenant opentracing.Tracer, or Jaeger to report queries using a different mechanism. I lack community experience and don't know which design is preferred.

@yurishkuro
Copy link
Member

yurishkuro commented Jun 10, 2022

Basically, in order to support self-reported spans, we have these options with additional work in each:

  • continue using Jaeger Go SDK sending spans over UDP:
    • Thrift data format must be extended to capture tenant
    • Jaeger Go SDK must be extended to populate the tenant in Thrift packets
  • continue using Jaeger Go SDK but switch to sending spans over HTTP or gRPC:
    • Jaeger Go SDK must be extended to populate the tenant header in HTTP/gRPC requests
  • switch to use OTEL SDK
    • I would also switch to use gRPC sending directly to collector
    • (not sure if already done) OTEL SDK must support sending x-tenant header
    • we can keep the existing OpenTracing instrumentation and use an OTEL bridge, to minimize the migration work

Could we just ignore the problem of self-reported spans for now and focus more on the production-quality deployments by enabling collector and query service?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants