Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opentelemetry-instrumentation-flask error during _teardown_request if request spawns child threads and copy_current_request_context is used #1653

Closed
hangonlyra opened this issue Feb 9, 2023 · 2 comments · Fixed by #1654
Labels
bug Something isn't working

Comments

@hangonlyra
Copy link
Contributor

hangonlyra commented Feb 9, 2023

Describe your environment
Python 3.9

opentelemetry-instrumentation-flask==0.36b0
Flask==2.2.2 (but happens on older versions as well)

Steps to reproduce

from concurrent.futures import ThreadPoolExecutor, as_completed
from random import randint

from flask import Flask, copy_current_request_context

from opentelemetry.instrumentation.flask import FlaskInstrumentor

app = Flask(__name__)

def do_random_stuff():
    @copy_current_request_context
    def inner():
        return randint(0, 100)
    return inner

@app.route("/")
def hello_world():
    count = 4
    executor = ThreadPoolExecutor(count)
    futures = []
    for _ in range(count):
        futures.append(executor.submit(do_random_stuff()))

    for future in as_completed(futures):
        print(future.result())

    return "Hello, World!"

FlaskInstrumentor.instrument_app(app)

Run it with
opentelemetry-instrument --traces_exporter console --metrics_exporter console flask run

What is the expected behavior?
Normal Open Telemetry traces being outputted when accessed via HTTP

What is the actual behavior?

[2023-02-09 10:11:16,480] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
  File "~/otelbug/env/lib/python3.9/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "~/otelbug/env/lib/python3.9/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "~/otelbug/env/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "~/otelbug/env/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "~/otelbug/app.py", line 25, in hello_world
    print(future.result())
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "~/otelbug/env/lib/python3.9/site-packages/flask/ctx.py", line 182, in wrapper
    return ctx.app.ensure_sync(f)(*args, **kwargs)
  File "~/otelbug/env/lib/python3.9/site-packages/flask/ctx.py", line 432, in __exit__
    self.pop(exc_value)
  File "~/otelbug/env/lib/python3.9/site-packages/flask/ctx.py", line 399, in pop
    self.app.do_teardown_request(exc)
  File "~/otelbug/env/lib/python3.9/site-packages/flask/app.py", line 2373, in do_teardown_request
    self.ensure_sync(func)(exc)
  File "~/otelbug/env/lib/python3.9/site-packages/opentelemetry/instrumentation/flask/__init__.py", line 441, in _teardown_request
    activation.__exit__(None, None, None)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 126, in __exit__
    next(self.gen)
ValueError: generator already executing

Additional context
This is very much a corner case.

It happens because of

  1. flask_request_environ[_ENVIRON_ACTIVATION_KEY] = activation in _before_request for opentelemetry.instrumentation.flask
  2. @copy_current_request_context then copies the entire request context into the children thread.
  3. This gets executed on the children threads once all the request context is copied over.
     activation = flask.request.environ.get(_ENVIRON_ACTIVATION_KEY)
     if not activation:
            # This request didn't start a span, maybe because it was created in
            # a way that doesn't run `before_request`, like when it is created
            # with `app.test_request_context`.
            return
    if exc is None:
        activation.__exit__(None, None, None)

A very easy fix is

flask_request_environ[_ENVIRON_ACTIVATION_KEY] = (threading.get_ident(), activation)
...
thread_id, activation = flask.request.environ.get(_ENVIRON_ACTIVATION_KEY)
if not activation or thread_id != threading.get_ident() :
    # This request didn't start a span, maybe because it was created in
    # a way that doesn't run `before_request`, like when it is created
    # with `app.test_request_context`.
    return
if exc is None:
    activation.__exit__(None, None, None)
@hangonlyra hangonlyra added the bug Something isn't working label Feb 9, 2023
@hangonlyra hangonlyra changed the title opentelemetry-instrumentation-flask error during _teardown_request if request spawns child threads opentelemetry-instrumentation-flask error during _teardown_request if request spawns child threads and copy_current_request_context is used Feb 9, 2023
@matthewgrossman
Copy link
Contributor

I think this is the same issue I reported here: #1551, with a similar suggestion for solving:

Somehow indicate in the environ which flask-req-ctx-id ended up creating the span. When calling _teardown_request, escape early if the current flask-req-ctx doesn't match the original req-ctx that created the span in the first place

The reason I can't align on using the thread_id is because my issue doesn't actually occur due to separate threads. I'm using gevent/greenlet to spawn new pieces of work, which would all have the same thread_id in your solution (I think?).

Greenlets do have a way of specifying their identity similarly, however: https://www.gevent.org/api/gevent.greenlet.html#gevent.Greenlet.minimal_ident. I wonder if instead we should provide someway in flask-instrumentor how to identify the running context (via an id-getter, or something), to make this generic

@matthewgrossman
Copy link
Contributor

Strike that comment actually; in the gevent services I run, we always monkey_patch at service-startup. Part of this job is to patch some important globals, like threading.get_ident(). This solution should work for me too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants