Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client.close() might not finish #9690

Closed
3 tasks done
rrika opened this issue Dec 18, 2023 · 17 comments
Closed
3 tasks done

Client.close() might not finish #9690

rrika opened this issue Dec 18, 2023 · 17 comments
Labels
invalid This is not right.

Comments

@rrika
Copy link

rrika commented Dec 18, 2023

Summary

await Client.close() can hang

Reproduction Steps

The beginning of Client.close looks like this:

    async def close(self) -> None:
        if self._closed:
            return
        self._closed = True
        await self._connection.close()
        # ...

Once it reaches the first 'await', execution can switch to the task running Client.connect, leave the while not self.is_closed() loop, leave the runner.run() context and shut down the asyncio loop. Anything awaiting the Client.close is now stuck.

This is relevant when using discord.py from the REPL where the main-thread is blocked on keyboard input, but websocket communication should continue. At exit it's necessary to close the connection from outside of the Client.connect thread/loop.

Offering a version of close() that isn't async would help.

Minimal Reproducible Code

import threading, asyncio, logging
import discord

token = # redacted

runner = asyncio.Runner()
def r(f):
	return asyncio.run_coroutine_threadsafe(f, runner.get_loop()).result()

logging.basicConfig(level=logging.DEBUG)
intents = discord.Intents()
intents.guilds = True
cl = discord.Client(intents=intents)

runner.run(cl.login(token))

t = threading.Thread(target=runner.run, args=(cl.connect(),))
t.start()

r(cl.wait_until_ready())

print("okay, now shut down")

r(cl.close())

Expected Results

Program ends

Actual Results

Program hangs

Intents

guilds

System Information

Checklist

  • I have searched the open issues for duplicates.
  • I have shown the entire traceback, if possible.
  • I have removed my token from display, if visible.

Additional Context

No response

@rrika rrika added the unconfirmed bug A bug report that needs triaging label Dec 18, 2023
@No767
Copy link

No767 commented Dec 18, 2023

Client.close closes the gateway connection and logs the bot off. I suspect the issue lies with your usage of threading, not with discord.py

@rrika
Copy link
Author

rrika commented Dec 18, 2023

please think about it. at the absolute minimum this is a documentation issue, the close() method lists no restrictions like "must only be (indirectly) called from connect()"

@No767
Copy link

No767 commented Dec 18, 2023

Could you explain more by "must be only be (indirectly) called from connect()"? The regular flow of using Client.start does call connect, which connections to the gateway and logs in.

@rrika
Copy link
Author

rrika commented Dec 18, 2023

I consider code in methods like on_ready/on_message "called from connect()". This happens through await self.ws.poll_event() within connect() as far as I can tell. Because it is called/awaited from connect() it is sure to have completed before connect() leaves the loop. The same doesn't hold for a Client.close() that doesn't happen in response to a discord event, but is added to the loop externally (like through run_coroutine_threadsafe)

To clarify this doesn't have anything to do with using Client.start vs. Client.connect I have rewritten the repro to use start (and also added a bit of synchronization to make the repro more reliable)

import threading, asyncio, logging
import discord

token = # redacted

runner = asyncio.Runner()
def r(f):
	return asyncio.run_coroutine_threadsafe(f, runner.get_loop()).result()

logging.basicConfig(level=logging.DEBUG)
intents = discord.Intents()
intents.guilds = True
cl = discord.Client(intents=intents)

may_wait_until_ready = threading.Event()
async def setup_hook():
	may_wait_until_ready.set()
cl.setup_hook = setup_hook

t = threading.Thread(target=runner.run, args=(cl.start(token),))
print("start")
t.start()

print("wait for may_wait_until_ready")
may_wait_until_ready.wait()

print("wait_until_ready")
r(cl.wait_until_ready())

print("okay, now shut down")
r(cl.close())

When Ctrl-C is pressed the following is shown:

KeyboardInterrupt
ERROR:asyncio:Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f1f3e9ddb00>, 256520.165383778)]']
connector: <aiohttp.connector.TCPConnector object at 0x7f1f3ea31250>
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task pending name='Task-12' coro=<Client.close() done, defined at venv/lib/python3.11/site-packages/discord/client.py:722> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_chain_future.<locals>._call_set_state() at /usr/lib/python3.11/asyncio/futures.py:394]>

[EDIT: simplified the code by removing the cl_start_token function]

@Rapptz Rapptz added invalid This is not right. and removed unconfirmed bug A bug report that needs triaging labels Dec 18, 2023
@Rapptz
Copy link
Owner

Rapptz commented Dec 18, 2023

Your reporting code just seems to be misuse and breakage of asyncio's threadsafe guarantees.

Consider the following calls:

import threading, asyncio, logging
import discord

token = # redacted

runner = asyncio.Runner()
def r(f):
	return asyncio.run_coroutine_threadsafe(f, runner.get_loop()).result()

logging.basicConfig(level=logging.DEBUG)
intents = discord.Intents()
intents.guilds = True
cl = discord.Client(intents=intents)

runner.run(cl.login(token))  # 1

t = threading.Thread(target=runner.run, args=(cl.connect(),))  # 2
t.start()

r(cl.wait_until_ready())  # 3

print("okay, now shut down")

r(cl.close())  # 4

You create an event loop over at call 1 in the "main thread" and this is where the entirety of the thread unsafe loop state is held. Over at call 2 you call connect in thread 2 without a corresponding threadsafe call. In call 3 you use the event loop threadsafe call in the same thread that it's already being called. You do the same thing in call 4.

Essentially you have this backwards. You need to do something like this:

runner = asyncio.Runner()
def r(f):
	return asyncio.run_coroutine_threadsafe(f, runner.get_loop()).result()

logging.basicConfig(level=logging.DEBUG)
intents = discord.Intents()
intents.guilds = True
cl = discord.Client(intents=intents)

runner.run(cl.login(token))  # 1

t = threading.Thread(target=r, args=(cl.connect(),))  # 2
t.start()

runner.run(cl.wait_until_ready())  # 3

print("okay, now shut down")

runner.run(cl.close())  # 4

In this code calls 1, 3, and 4 are done in the main thread using the runner in the thread the loop was created on and call 2 uses the threadsafe call as intended. In this example code the exit works as expected.

@Rapptz Rapptz closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023
@rrika
Copy link
Author

rrika commented Dec 18, 2023

Hi, first of all, thanks for taking the time to look at this.

I understand that run_coroutine_threadsafe needs to be used whenever adding tasks from a thread that is not the one currently executing Runner.run. But I don't see anything in the python documentation that says that all invocations of run need to come from the same thread. My understanding is that all those calls are separate, and that calls 3 and 4 on the main thread work because they're relative to where the loop is currently running (which is the second thread, as launched by 2).

Either way though, I posted a revised repro in a comment that doesn't split up login+connect over two invocations of Runner.run. That should work based on what you said, but still doesn't.

The last bit of code you posted terminates, but by virtue of not having any true multi-threading. Anything between 3 and 4 will block heartbeats. Replacing print("okay, now shut down") with time.sleep(120) will suppress the DEBUG:discord.gateway:Keeping shard ID None websocket alive with sequence 2. messages that I else see.
So this doesn't help with the scenario of wanting to start/use/stop the discord client in a thread that is not the main/REPL one [from the main one].

@Rapptz
Copy link
Owner

Rapptz commented Dec 19, 2023

But I don't see anything in the python documentation that says that all invocations of run need to come from the same thread.

The documentation mentions that the event loop objects aren't thread safe in multiple places. asyncio is meant to only execute in one thread with the thread-safe methods being an escape hatch. Literally every single class in the asyncio module explicitly says "this class is not thread safe".

I do not see your revised code anywhere. Note that in an asyncio capable REPL such as IPython or python -m asyncio you should be able to use discord.py and await just fine without needing to do any type of thread work. To launch the connect loop in the background (if you need it) you can use loop.create_task.

@rrika
Copy link
Author

rrika commented Dec 19, 2023

I do not see your revised code anywhere.

It's right above your first comment. #9690 (comment)

I didn't know about -m asyncio that's cool.

@Rapptz
Copy link
Owner

Rapptz commented Dec 19, 2023

That code is still incorrect for the same reason I gave.

@rrika
Copy link
Author

rrika commented Dec 19, 2023

That code is still incorrect for the same reason I gave.

I don't see how.

You create an event loop over at call 1 in the "main thread" and this is where the entirety of the thread unsafe loop state is held.

I don't think the thread of creation matters, only the thread that is currently executing run(). I have adapted the repro once more to show that even when creation and running happen on thread 2, the same issue is visible.

In call 3 you use the event loop threadsafe call in the same thread that it's already being called.

I'm not sure what you mean here, the loop is -not- on the same thread.

import threading, asyncio, logging
import discord

token = # redacted

logging.basicConfig(level=logging.DEBUG)
intents = discord.Intents()
intents.guilds = True
cl = discord.Client(intents=intents)

runner = None
def create_loop_and_run():
	global runner
	runner = asyncio.Runner()
	runner.run(cl.start(token))

may_wait_until_ready = threading.Event()
async def setup_hook():
	may_wait_until_ready.set()
cl.setup_hook = setup_hook

def r(f):
	return asyncio.run_coroutine_threadsafe(f, runner.get_loop()).result()

t = threading.Thread(target=create_loop_and_run)
print("start")
t.start()

print("wait for may_wait_until_ready")
may_wait_until_ready.wait()

print("wait_until_ready")
r(cl.wait_until_ready())

print("okay, now shut down")
r(cl.close())

Regardless of this example, would you agree that there is an undocumented requirement on the use of the close() method? Something like "Note: close() might complete after start(). The caller is responsible for keeping the event loop running until the end of both."

@Rapptz
Copy link
Owner

Rapptz commented Dec 20, 2023

I don't see how.

I'm not sure how to explain it any better than I already did.

I don't think the thread of creation matters, only the thread that is currently executing run(). I have adapted the repro once more to show that even when creation and running happen on thread 2, the same issue is visible.

There seems to be a misunderstanding on how event loops work and how they interact with asyncio objects and the asyncio.Runner and asyncio.run. I guess that's fair since these concepts have been abstracted away.

The library maintains a loop reference and expects it to be the running loop. asyncio.run and creating a new asyncio.Runner create a new event loop which means that all futures, tasks, and loop references are different. Which means that await points are waiting for a different non-executing event loop.

Regardless of this example, would you agree that there is an undocumented requirement on the use of the close() method? Something like "Note: close() might complete after start(). The caller is responsible for keeping the event loop running until the end of both."

There is no extraneous requirement for close(). From looking at the code I see that close() might not work as expected if loop initialisation is not done but your examples don't trigger this condition. So far every example here has been misuse of asyncio itself.

@rrika
Copy link
Author

rrika commented Dec 20, 2023

The library maintains a loop reference

Oh, you mean discordpy, not asyncio. When I said "I don't think the thread of creation matters" I meant Runner.__init__, not Client.__init__. Looking at Client though, it doesn't initialize self.loop until _async_setup_hook, by which time it's already in the correct loop. HTTPClient and ConnectionState don't seem to do anything with the active loop in their constructors either. To be sure however I adapted the repro to construct the Client object on the second thread as well. I still see the same hang.

Which means that await points are waiting for a different non-executing event loop.

There's only one loop in this case though, no? Also, python checks that awaits happen to Futures of matching loops regardless of whether debug=True is set on the loop (see Task.__step in asyncio/tasks.py, if futures._get_loop(result) is not self._loop:) and I haven't seen any such errors from Python.

import threading, asyncio, logging
import discord

token = # redacted

cl = None
runner = None
may_wait_until_ready = threading.Event()

def discord_thread():
	global cl, runner

	logging.basicConfig(level=logging.DEBUG)
	intents = discord.Intents()
	intents.guilds = True
	cl = discord.Client(intents=intents)
	async def setup_hook():
		may_wait_until_ready.set()
	cl.setup_hook = setup_hook

	runner = asyncio.Runner(debug=True)
	runner.run(cl.start(token))

def r(f):
	return asyncio.run_coroutine_threadsafe(f, runner.get_loop()).result()

t = threading.Thread(target=discord_thread)
print("start")
t.start()

print("wait for may_wait_until_ready")
may_wait_until_ready.wait()

print("wait_until_ready")
r(cl.wait_until_ready())

print("okay, now shut down")
r(cl.close()) # close gets stuck in "await self.ws.close(code=1000)"

@roundedrectangle
Copy link

roundedrectangle commented Oct 16, 2024

Hello @rrika. Have you fixed the problem with ws.close getting stuck? My code also does so, and throws CancelledError on the ws.close line. Other awaitable Client methods work properly in my code, and no errors are thrown with them

@rrika
Copy link
Author

rrika commented Oct 16, 2024

so, this is very ugly and but it's the workaround I settled on. I also haven't updated my version of discord.py since writing this. so it might not work anymore.

import threading, asyncio, logging
import discord

token = ...
server = ...
channel = ...

runner = asyncio.Runner()
def r(f):
	return asyncio.run_coroutine_threadsafe(f, runner.get_loop()).result()

logging.basicConfig(level=logging.DEBUG)
intents = discord.Intents()
intents.guilds = True
cl = discord.Client(intents=intents)

close_task = None
may_wait_until_ready = threading.Event()

async def setup_hook():
	may_wait_until_ready.set()
cl.setup_hook = setup_hook

async def cl_connect():
	global close_task
	await cl.start(token)
	print("exited connect")
	if close_task:
		await close_task
		print("awaited close task")
	else:
		print("no close task")

async def cl_close():
	print("cl.close")
	await cl.close()
	print("cl.close done")

def runner_run():
	runner.run(cl_connect())
	print("exited runner")


t = threading.Thread(target=runner_run)
t.start()

print("wait for may_wait_until_ready")
may_wait_until_ready.wait()

print("wait_until_ready")
r(cl.wait_until_ready())

main_thread = threading.main_thread()

def wait_for_main():
	global close_task
	main_thread.join()
	print("done, close client")
	close_task = runner.get_loop().create_task(cl_close()) # QUESTIONABLE, see comments below

threading.Thread(target=wait_for_main).start()

# whatever
sv = cl.get_guild(server)
ch = sv.get_channel(channel)
m = r(ch.fetch_message(msgid)) # do it on the remote thread with r(...)

To the best of my memory:

  1. the main thread is free to do whatever it wants. in this case it creates the discord.Client object but as discussed above this can happen anywhere
  2. a second thread runs the event loop, and in that loop runs cl.start -except- it doesn't do so directly but uses a wrapper cl_connect which makes sure that upon exit it waits for any task stuck in cl.close before returning and thereby shutting down the loop
  3. to find that closing task cl_connect uses a global variable close_task which is written to by wait_for_main which lurks on a third thread, waiting for the shutdown of the primary thread.

So then during shutdown:

  1. Thread 1 is blocking on keyboard input (REPL prompt). Thread 2 is blocking on incoming network traffic, and the pipe that lets it receive more work from thread 1. Thread 3 is blocking on the shutdown of thread 1.
  2. User hits Ctrl-C. Thread 1 terminates.
  3. Thread 3 continues, creates the cl_close task, and terminates. (I don't remember why it's using create_task instead of run_coroutine_threadsafe.)
  4. cl.close runs on thread 2, sets self._closed = True on the client and executes await self._connection.close() which eventually passes execution back to the main discord client task on thread 2
  5. The main task leaves its while not self.is_closed() loop, returns to cl_connect where it will await close_task
  6. The close task finishes
  7. The client task finishes
  8. The loop shuts down
  9. Thread 2 shuts down
  10. Python shuts down

@mikeshardmind
Copy link
Contributor

so, this is very ugly and but it's the workaround I settled on. I also haven't updated my version of discord.py since writing this. so it might not work anymore.

This working at all would be entirely implementation detail, and is outside of the guarantees asyncio actually makes for thread-safety. I would strongly suggest avoiding use of asyncio event loops in threads without taking the time to read what is and isn't guaranteed and ensure asyncio resources are created according to those guarantees.

@roundedrectangle
Copy link

Thank you, I will try your workaround tomorrow

@rrika
Copy link
Author

rrika commented Oct 16, 2024

If I attempted this again I'd do the launch of cl_close in two stages. First get over to the correct loop with run_coroutine_threadsafe or call_soon_threadsafe. Then once there wrap the cl.close in a task that can be awaited between cl.start and the end of the loop.
It'd also be a good idea to move runner and client creation over to the thread that will actually be using it, at least for clarity.

Repository owner locked as resolved and limited conversation to collaborators Oct 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
invalid This is not right.
Projects
None yet
Development

No branches or pull requests

5 participants