asyncio.TaskGroup may silently discard request to run a task #116048

arthur-tacca · 2024-02-28T14:08:52Z

Bug report

Bug description:

As I understand it, asyncio.TaskGroup should never silently lose a task. Any time you use TaskGroup.create_task(), one of these two things happens:

The coroutine you pass runs to the end, with the TaskGroup context manager waiting until this happens. Possibly the task is cancelled, maybe due to another task in the group raising an exception, but the coroutine "sees" this (it gets the cancellation exception) so it can always do some handling of it if needed, and the task group still waits for that all to finish.
The TaskGroup.create_task() method raises a RuntimeError because it is not possible to start the task. This happens when the task group is not running (because the context manager hasn't been entered or has already exited), or because it is in the process of shutting down (because one of the tasks in it has finished with an exception).

(Disclaimer: My background is as a user of Trio, where these are definitely the only two possibilities. The main difference is that starting a task in an active but cancelled Trio nursery will start a task and cancel it immediately, which allows it to run to its first unshielded await, rather than raising an exception from start_soon(). But that's a small design difference. The point is that you are still guaranteed one of the two possibilities.)

However, a task can be silently lost if the task group it is in gets shut down before a recently-created task has a chance to get scheduled at all, as in this example:

async with asyncio.TaskGroup() as tg:
    tg.create_task(my_fn(3))
    raise RuntimeError

This snippet seems a bit silly because the task group gets shut down by an exception from the same child task that is spawning a new sibling. But the same situation can happen when an uncaught exception gets thrown by one child at roughly the same time that another child has spawned a sibling. (I came across this issue by launching a task from an inner task group while the outer task group was in the process of shutting down.)

Overall, this follows from this behaviour of asyncio tasks:

t = asyncio.create_task(my_fn())
t.cancel()

This will not run my_fn() at all, not even to the first await within it. This is despite the fact that the docs for asyncio.Task.cancel() say:

This arranges for a CancelledError exception to be thrown into the wrapped coroutine on the next cycle of the event loop. The coroutine then has a chance to clean up or even deny the request ...

I looked over old issues to see if this had been reported and found the #98275 which suggested changing the docs to warn about this, but it has since been closed.

Personally, I would say that the behaviour of create_task and Task.cancel() are incorrect, but looking back at that discussion I can see that this is a matter of opinion. However, I think the task group behaviour really does need to be fixed. It's hard to see how this could be reliably done with the current task behaviour, so I think that gives some weight that it really is the undelying issue.

Perhaps, as a compromise, there could be a new parameter asyncio.create_task(..., run_to_first_await=False), which can be set to True by TaskGroup and other users wanting robust cancellation detection?

CPython versions tested on:

3.12

Operating systems tested on:

Windows

The text was updated successfully, but these errors were encountered:

arthur-tacca · 2024-02-28T19:50:10Z

Related: #115957

gvanrossum · 2024-03-01T19:50:07Z

I don't feel like introducing flags for alternate semantics. When you create a coroutine and immediately throw an exception into it, the coroutine (which hasn't started executing its body yet) will be cancelled without getting a chance to clean up. Those are the semantics of coroutines defined with async def and the asyncio module faithfully follows this principle.

It's fine to add some documentation for this, but I'm not sure that the various create_task() functions and methods are the place for that -- the behavior is inherent in async def so should be clarified there (if it isn't already).

arthur-tacca · 2024-03-03T19:41:00Z

I do see your point: if an abstraction has very similar semantics to the layer below, it's often simplest to declare the semantics are exactly the same and be done with it. But I think most of asyncio's users will miss this detail, even with the suggested documentation note. I certainly missed it even though I was specifically looking for race conditions.

I'm mainly concerned about task groups because the whole point of them is to protect you from edge cases when there are exceptions. But you can use them to write simple code that looks clearly correct, and could be correct (and should be IMO), but suffers from this issue. Like this (full demo):

async def use_connection(conn):
    try:
        await conn.use()  # Might raise exception
    finally:
        conn.close()

async def process_data(item_iter):
    async with asyncio.TaskGroup() as tg:
        async for item in item_iter:
            conn = create_connection(item)
            tg.create_task(use_connection(conn))

Even if you do realise that above code is broken (perhaps by finding out the hard way), there's no really simple fix for it. (Of course you could move connection creation to use_connection() but that's just because it's a toy example; in a real program, that might be impossible without significant restructuring.) The best I can manage involves changing the driver function and wrapping the worker function, like this:

async def use_connection(conn):
    try:
        await conn.use()  # Might raise exception
    finally:
        conn.close()

async def use_connection_wrapper(conn, open_connections):
    open_connections.remove(conn)
    await use_connection(conn)

async def process_data(item_iter):
    open_connections = set()
    try:
        async with asyncio.TaskGroup() as tg:
            async for item in item_iter:
                conn = create_connection(item)
                open_connections.add(conn)
                tg.create_task(use_connection_wrapper(conn, open_connections))
    finally:
        for conn in open_connections:
            conn.close()

If you look at this through the eyes of an asyncio user that doesn't know every low-level detail, it's hard to see the second snippet as anything other than a hack to work around a bug in asyncio. It's certainly not the natural way to write it.

arthur-tacca · 2024-03-04T10:34:34Z

Although I've not tested it, I believe this issue also applies to asyncio.start_server when used with task groups:

async def handle_connection(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
    try:
        await asyncio.sleep(1)  # ... use connection ...
    finally:
        writer.transport.abort()

async def run_server(port):
    async with asyncio.TaskGroup() as tg:
        def handle_in_taskgroup(r, w):
            tg.create_task(handle_connection(r, w))
        server = await asyncio.start_server(handle_in_taskgroup, port=port)
        tg.create_task(server.serve_forever())

This is not a surprise, as it follows the same pattern as the snippet in my previous comment. It could be worked around in the same way, but again I think that is very non-obvious.

It would be convenient, regardless of safety, if you could specify a task group to start_server (like the handler_nursery= parameter to trio.serve_tcp()):

await asyncio.start_server(handle_connection, port=port, task_group=tg)

That could be used as a reason not to fix this issue in general, because this form of start_server() would give an opportunity to fix this in start_server() itself. But I think this still illustrates how pervasive this type of problem is.

gvanrossum · 2024-03-04T20:03:26Z

Let's move the feature request for start_server() to a different issue (and wait until this one has settled).

I feel that the best way forward for you, and for others who feel this is a misfeature of coroutine design that asyncio should explicitly work around, is to turn on eager tasks. This is available from 3.12 onward. If eager tasks don't fit your usage pattern, please explain.

arthur-tacca · 2024-03-07T15:17:19Z

It feels a little wrong to use an optimisation (at least I assumed that's what that was for) that happens to have a semantic side effort as a work around this, but to be fair I can't think of any specific problem with it.

The original comment issue in the issue for that, #97696, actually mentioned this specific case, but I think it would be very hard for most asyncio users to figure this out for themselves. My preference would be for a warning in the TaskGroup docs along the lines of "warning: task groups are unsafe / broken unless you use the magic incantation asyncio.get_running_loop().set_task_factory(asyncio.eager_task_factory) at the start of your program." But I understand that something more balanced might be more appropriate.

arthur-tacca · 2024-03-07T15:18:44Z

Oh and there's an existing discussion on Discuss of the API change for start_server() so I posted about that there.

arthur-tacca added the type-bug An unexpected behavior, bug, or error label Feb 28, 2024

AlexWaygood added the topic-asyncio label Feb 28, 2024

arthur-tacca mentioned this issue Apr 12, 2024

RuntimeWarning: coroutine method 'aclose' of '...' was never awaited when breaking out of async for #117536

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asyncio.TaskGroup may silently discard request to run a task #116048

asyncio.TaskGroup may silently discard request to run a task #116048

arthur-tacca commented Feb 28, 2024 •

edited

Loading

arthur-tacca commented Feb 28, 2024

gvanrossum commented Mar 1, 2024

arthur-tacca commented Mar 3, 2024

arthur-tacca commented Mar 4, 2024

gvanrossum commented Mar 4, 2024

arthur-tacca commented Mar 7, 2024

arthur-tacca commented Mar 7, 2024

asyncio.TaskGroup may silently discard request to run a task #116048

asyncio.TaskGroup may silently discard request to run a task #116048

Comments

arthur-tacca commented Feb 28, 2024 • edited Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

arthur-tacca commented Feb 28, 2024

gvanrossum commented Mar 1, 2024

arthur-tacca commented Mar 3, 2024

arthur-tacca commented Mar 4, 2024

gvanrossum commented Mar 4, 2024

arthur-tacca commented Mar 7, 2024

arthur-tacca commented Mar 7, 2024

arthur-tacca commented Feb 28, 2024 •

edited

Loading