Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastApi auto-instrumentation send extra spans for a single API execution #831

Closed
RunzhongHuang opened this issue Dec 10, 2021 · 22 comments · Fixed by #2802
Closed

FastApi auto-instrumentation send extra spans for a single API execution #831

RunzhongHuang opened this issue Dec 10, 2021 · 22 comments · Fixed by #2802
Labels
bug Something isn't working

Comments

@RunzhongHuang
Copy link

Hi open-telemetry community,

I have a python project with fastapi auto-instrumentation which export the spans to zipkin server, and I have a simple "GET" API which returns a "hello". But I notice it generates 3 spans instead of 1.

image

As you can see, the top one is related to the request I send out. But the other 2 child spans, I am not sure why they are there.

Describe your environment
python = 3.8
opentelemetry-api = 1.7.1
opentelemetry-sdk = 1.7.1
opentelemetry-exporter-zipkin = 1.7.1
opentelemetry-instrumentation-fastapi = 0.26b1
fastapi = 0.70.0
uvicorn = 0.15.0

Steps to reproduce

from opentelemetry import trace
from opentelemetry.exporter.zipkin.json import ZipkinExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import uvicorn

app = FastAPI()

tracer = trace.get_tracer(__name__)

zipkin_exporter = ZipkinExporter(
    endpoint="https://bolt-msl-zipkin-server.eks-dev-us-east-1.container.spglobal.com:30443/api/v2/spans",
)
span_processor = SimpleSpanProcessor(zipkin_exporter)
trace_provider = TracerProvider(resource=Resource.create({"service.name": "open-telemetry-test"}))
trace_provider.add_span_processor(span_processor)

@app.get('/opentelemetry-fastapi')
def about():
   return "hello"


FastAPIInstrumentor.instrument_app(app,
                                   tracer_provider=trace_provider)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Here is the REST request I made:

curl --location --request GET 'http://localhost:8000/opentelemetry-fastapi'

Here is the JSON span metadata:

[{
	"traceId": "7e5511d0c28934bf480bc7f28ba2d34d",
	"parentId": "76122e06995e41b6",
	"id": "a920c88a2a64c8e2",
	"name": "/opentelemetry-fastapi http send",
	"timestamp": 1639091853270289,
	"duration": 1002,
	"localEndpoint": {
		"serviceName": "open-telemetry-test"
	},
	"tags": {
		"otel.library.name": "opentelemetry.instrumentation.asgi",
		"otel.library.version": "0.26b1",
		"service.name": "open-telemetry-test",
		"telemetry.sdk.language": "python",
		"telemetry.sdk.name": "opentelemetry",
		"telemetry.sdk.version": "1.7.1",
		"type": "http.response.body"
	},
	"debug": true
}, {
	"traceId": "7e5511d0c28934bf480bc7f28ba2d34d",
	"id": "76122e06995e41b6",
	"kind": "SERVER",
	"name": "/opentelemetry-fastapi",
	"timestamp": 1639091853034614,
	"duration": 289682,
	"localEndpoint": {
		"serviceName": "open-telemetry-test"
	},
	"tags": {
		"correlation-id": "111222333444555",
		"http.flavor": "1.1",
		"http.host": "127.0.0.1:8000",
		"http.method": "GET",
		"http.route": "/opentelemetry-fastapi",
		"http.scheme": "http",
		"http.server_name": "localhost:8000",
		"http.status_code": "200",
		"http.target": "/opentelemetry-fastapi",
		"http.url": "http://127.0.0.1:8000/opentelemetry-fastapi",
		"http.user_agent": "PostmanRuntime/7.28.4",
		"net.host.port": "8000",
		"net.peer.ip": "127.0.0.1",
		"net.peer.port": "55402",
		"otel.library.name": "opentelemetry.instrumentation.asgi",
		"otel.library.version": "0.26b1",
		"region": "us-east-1",
		"service.name": "open-telemetry-test",
		"telemetry.sdk.language": "python",
		"telemetry.sdk.name": "opentelemetry",
		"telemetry.sdk.version": "1.7.1"
	},
	"debug": true
}, {
	"traceId": "7e5511d0c28934bf480bc7f28ba2d34d",
	"parentId": "76122e06995e41b6",
	"id": "edd83d58575f0811",
	"name": "/opentelemetry-fastapi http send",
	"timestamp": 1639091853053617,
	"localEndpoint": {
		"serviceName": "open-telemetry-test"
	},
	"tags": {
		"http.status_code": "200",
		"otel.library.name": "opentelemetry.instrumentation.asgi",
		"otel.library.version": "0.26b1",
		"service.name": "open-telemetry-test",
		"telemetry.sdk.language": "python",
		"telemetry.sdk.name": "opentelemetry",
		"telemetry.sdk.version": "1.7.1",
		"type": "http.response.start"
	},
	"debug": true
}]

What is the expected behavior?
Exactly one span related to the API get executed.

What is the actual behavior?
Besides the span relates to the API got executed, there are 2 extra spans are generated.

Additional context

I am suspecting this is related to the ASGI interfaces since the extra span has "send" in it which related to sending response back. But shouldn't it be a single "send" instead two?

I feel like there could some missing link regarding certain concept. I am looking forward to hearing from your suggestions.

Thanks for your time!

@mohdnr
Copy link

mohdnr commented Dec 19, 2021

i'm also experiencing the same issue when using fastapi + xray.

image

@owais
Copy link
Contributor

owais commented Dec 20, 2021

At first glance it seems to be the case because of the async nature of FastAPI. If you look at type attribute, it hints that two distinct events are being recorded.

		"type": "http.response.start"

and

		"type": "http.response.body"

It appears that first span represents sending back headers or initiating a response while the second one represending sending the response body.

@mohdnr
Copy link

mohdnr commented Dec 21, 2021

Just reviewed my logs and i'm seeing what you mean. Is the practise to tie all of these together downstream where you're storing your data ?

type: STRING(http.response.start)
type: STRING(http.response.start)
type: STRING(http.request)
type: STRING(http.response.body)
type: STRING(http.response.body)
type: STRING(http.response.body)
type: STRING(http.disconnect)

@owais
Copy link
Contributor

owais commented Dec 21, 2021

They are all part of the same trace so generally backends do store them together or close to each other if that is what you meant.

@mohdnr
Copy link

mohdnr commented Dec 21, 2021

They are all part of the same trace so generally backends do store them together or close to each other if that is what you meant.

That makes sense, thanks for your help! I see them all grouped when i added Xray as a data source in Grafana.

@raphaelauv
Copy link

if you want disable them , we would need that -> #815

@RunzhongHuang
Copy link
Author

@owais thanks for the explanation about these types, and I totally understand why the internal asgi interactions are captured within the same trace (e.g., debugging purpose).

I think the reason I raise this question is because these extra spans become confusing in the zipkin GUI (in the example I provided only has 2 extra spans, however in one of my another project it generated more estra since it not only serves the incoming request, but also makes external requests). And I would like to know if there is a way to suppress these 'internal subtle' spans in order to have a cleaner service trace chain? Or the way I coded is not correct.

@raphaelauv thank you very much! I will keep an eye on the status of the PR #815

@RunzhongHuang
Copy link
Author

Hi @raphaelauv , I noticed the PR #815 is closed without merge. I assume that meaning this little patch is not available in the release. Then I would like to ask what could be the other possible approachs to disabled these internal spans?

@nils-borrmann-y42
Copy link
Contributor

nils-borrmann-y42 commented May 3, 2022

I stumbled accross the same issue and managed to get rid of these extra spans fairly easily by overriding two methods in the OpenTelemetryMiddleware:

from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware

class MyOpenTelemetryMiddleware(OpenTelemetryMiddleware):

    def _get_otel_receive(self, server_span_name, scope, receive):
        return receive

    def _get_otel_send(self, server_span, server_span_name, scope, send):
        @wraps(send)
        async def otel_send(message):
            if message["type"] == "http.response.start":
                set_status_code(server_span, message["status"])
            elif message["type"] == "websocket.send":
                set_status_code(server_span, 200)

            propagator = get_global_response_propagator()
            if propagator:
                propagator.inject(
                    message,
                    context=set_span_in_context(
                        server_span, trace.context_api.Context()
                    ),
                    setter=asgi_setter,
                )

            await send(message)

        return otel_send

# in create_app()
app = Quart(__name__)
app.asgi_app = MyOpenTelemetryMiddleware(app.asgi_app)

@blueswen
Copy link
Contributor

blueswen commented May 4, 2022

I stumbled accross the same issue and managed to get rid of these extra spans fairly easily by overriding two methods in the OpenTelemetryMiddleware:

class MyOpenTelemetryMiddleware(OpenTelemetryMiddleware):

    def _get_otel_receive(self, server_span_name, scope, receive):
        return receive

    def _get_otel_send(self, server_span, server_span_name, scope, send):
        @wraps(send)
        async def otel_send(message):
            if message["type"] == "http.response.start":
                set_status_code(server_span, message["status"])
            elif message["type"] == "websocket.send":
                set_status_code(server_span, 200)

            propagator = get_global_response_propagator()
            if propagator:
                propagator.inject(
                    message,
                    context=set_span_in_context(
                        server_span, trace.context_api.Context()
                    ),
                    setter=asgi_setter,
                )

            await send(message)

        return otel_send

@nils-borrmann-y42 thanks for the workaround, it works perfectly. The CancelledError Exception issue #467 in internal span under python 3.8 could also fixed with this workaround.

@RunzhongHuang
Copy link
Author

Hi @blueswen and @nils-borrmann-y42. Thanks for your reply and confirmation in this issue. I have a quick question regard this overrided "OpenTelemetryMiddleware".

The "FastAPIInstrumentor" class didn't expose any parameters to pass "OpenTelemetryMiddleware" object. It directly add the "OpenTelemetryMiddleware" as the middle ware. May I ask how you utilize your "MyOpenTelemetryMiddleware" ? Is there anywhere that missed here?

image

@blueswen
Copy link
Contributor

This gist is my workaround, which copy the entire FastAPIInstrumentor Code then add custom middleware (line from 167) and import some required modules (line from 148).

@RunzhongHuang , as you said, there is no parameters to pass "OpenTelemetryMiddleware" object. So I thinks this is the only way to do the hack.

@nils-borrmann-y42
Copy link
Contributor

I actually only used it for the quart framework, where the OpenTelemetryMiddleware is used directly. Looks like it is a bit more complicated for fastapi.

@RunzhongHuang
Copy link
Author

Thank you @blueswen and @nils-borrmann-y42! I thought the same and seem no other way around, let me test it out on my project.

@xkortex
Copy link

xkortex commented Sep 29, 2022

This feature of creating a span for the initial ASGI scope plus the send/receive events is simultaneously really helpful (I used this to discover some requests were timing out after receive but before send) but also really noisy (multipart messages send one receive span for each packet received, which can get really noisy when a single trace has hundreds of spans. It would be lovely to have more configurability as to turning these internal spans on/off, or controlling when they trigger.

@ehsannm
Copy link

ehsannm commented Oct 21, 2022

Thumbs up for add more configuration. I need to use the hack that @blueswen shared. Looking forward to add it as a configuration parameter. It seems a very quick and small fix, but ticket is open for over a year now

@srikanthccv
Copy link
Member

You are always welcome to send a fix :)

@nils-borrmann-y42
Copy link
Contributor

@srikanthccv Someone already did and the PR was rejected: #815

@srikanthccv
Copy link
Member

It wasn't rejected. It was closed by author without any further explanation. I don't know why they decided to do that. One can drop the spans by other means, such as as custom processor or sampler but I think the configurability on instrumentation is also fine especially because dropping span which is used for propagation breaks the tree structure of trace.

@outstanding1301
Copy link

I did it by implementing SpanProcessor

by overriding on_end of BatchSpanProcessor.
skip when span kind is INTERNAL and attributes.type is in (http.request, http.response.start, http.response.body)

    def on_end(self, span: ReadableSpan) -> None:
        if span.kind == SpanKind.INTERNAL and (
                span.attributes.get('type', None) in ('http.request',
                                                      'http.response.start',
                                                      'http.response.body')
        ):
            return
        super().on_end(span=span)

Full Code

from opentelemetry.sdk.trace import ReadableSpan
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import SpanKind


class MySpanProcessor(BatchSpanProcessor):
    def on_end(self, span: ReadableSpan) -> None:
        if span.kind == SpanKind.INTERNAL and (
                span.attributes.get('type', None) in ('http.request',
                                                      'http.response.start',
                                                      'http.response.body')
        ):
            return
        super().on_end(span=span)

and attach it to provider

provider = TracerProvider(resource=resource)
processor = MySpanProcessor(span_exporter=exporter)
provider.add_span_processor(span_processor=processor)

@nirfuzz
Copy link

nirfuzz commented Apr 21, 2024

any updates on this?

@wuranxu
Copy link

wuranxu commented Oct 17, 2024

I did it by implementing SpanProcessor

by overriding on_end of BatchSpanProcessor. skip when span kind is INTERNAL and attributes.type is in (http.request, http.response.start, http.response.body)

    def on_end(self, span: ReadableSpan) -> None:
        if span.kind == SpanKind.INTERNAL and (
                span.attributes.get('type', None) in ('http.request',
                                                      'http.response.start',
                                                      'http.response.body')
        ):
            return
        super().on_end(span=span)

Full Code

from opentelemetry.sdk.trace import ReadableSpan
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import SpanKind


class MySpanProcessor(BatchSpanProcessor):
    def on_end(self, span: ReadableSpan) -> None:
        if span.kind == SpanKind.INTERNAL and (
                span.attributes.get('type', None) in ('http.request',
                                                      'http.response.start',
                                                      'http.response.body')
        ):
            return
        super().on_end(span=span)

and attach it to provider

provider = TracerProvider(resource=resource)
processor = MySpanProcessor(span_exporter=exporter)
provider.add_span_processor(span_processor=processor)

Thanks! the attributed named "type" has been changed to "asgi.event.type", it works well now~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment