Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API deadlock in hyper_executor::poll_next #3369

Closed
jsha opened this issue Oct 23, 2023 · 3 comments · Fixed by #3370
Closed

C API deadlock in hyper_executor::poll_next #3369

jsha opened this issue Oct 23, 2023 · 3 comments · Fixed by #3370
Labels
A-ffi Area: ffi (C API) C-bug Category: bug. Something is wrong. This is bad! E-easy Effort: easy. A task that would be a great starting point for a new contributor.

Comments

@jsha
Copy link
Contributor

jsha commented Oct 23, 2023

Version: current master, dd638b5
Platform: Linux 6.2.0-34-generic #34-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 4 13:06:55 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

In curl/curl#11203 (comment), @marlonbaeten reported that removing hyper_clientconn_free(client); from curl's c-hyper code caused curl to never send the request.

I did a little poking around and I believe what's happening is a deadlock. I'm not sure yet why freeing the client triggers the deadlock, but I can see in the Hyper code where it happens.

In hyper_executor::poll_next, self.driver is locked and then drain_queue is called :

hyper/src/ffi/task.rs

Lines 129 to 134 in dd638b5

match Pin::new(&mut *self.driver.lock().unwrap()).poll_next(&mut cx) {
Poll::Ready(val) => return val,
Poll::Pending => {
// Check if any of the pending tasks tried to spawn
// some new tasks. If so, drain into the driver and loop.
if self.drain_queue() {

drain_queue then tries to lock self.driver a second time:

hyper/src/ffi/task.rs

Lines 151 to 156 in dd638b5

let mut queue = self.spawn_queue.lock().unwrap();
if queue.is_empty() {
return false;
}
let driver = self.driver.lock().unwrap();

The docs for Mutex::lock say:

The exact behavior on locking a mutex in the thread which already holds the lock is left unspecified. However, this function will not return on the second call (it might panic or deadlock, for example).

Steps to reproduce:

Build Hyper's FFI library using this patch to add some println debugging lines:

diff --git a/src/ffi/task.rs b/src/ffi/task.rs
index e75dfc1a..e868f39d 100644
--- a/src/ffi/task.rs
+++ b/src/ffi/task.rs
@@ -131,6 +131,7 @@ impl hyper_executor {
                 Poll::Pending => {
                     // Check if any of the pending tasks tried to spawn
                     // some new tasks. If so, drain into the driver and loop.
+                    eprintln!("draining queue??");
                     if self.drain_queue() {
                         continue;
                     }
@@ -149,11 +150,14 @@ impl hyper_executor {
 
     fn drain_queue(&self) -> bool {
         let mut queue = self.spawn_queue.lock().unwrap();
+        eprintln!("locked spawn queue :D");
         if queue.is_empty() {
             return false;
         }
 
+        eprintln!("trying to lock driver. this might deadlock :/");
         let driver = self.driver.lock().unwrap();
+        eprintln!("locked driver! :D :D :D");
 
         for task in queue.drain(..) {
             driver.push(task);

Build curl against that version of Hyper's FFI library, using this patch against curl:

diff --git a/lib/c-hyper.c b/lib/c-hyper.c
index 5726ff1cc..7a5e764c8 100644
--- a/lib/c-hyper.c
+++ b/lib/c-hyper.c
@@ -1214,9 +1214,6 @@ CURLcode Curl_http(struct Curl_easy *data, bool *done)
   }
   sendtask = NULL; /* ownership passed on */
 
-  hyper_clientconn_free(client);
-  client = NULL;
-
   if((httpreq == HTTPREQ_GET) || (httpreq == HTTPREQ_HEAD)) {
     /* HTTP GET/HEAD download */
     Curl_pgrsSetUploadSize(data, 0); /* nothing */

Observe this output, followed by a hang:

$ ~/curl/src/curl https://example.com/
locked spawn queue :D
trying to lock driver. this might deadlock :/
locked driver! :D :D :D
locked spawn queue :D
trying to lock driver. this might deadlock :/
locked driver! :D :D :D
draining queue??
locked spawn queue :D
trying to lock driver. this might deadlock :/
@jsha jsha added the C-bug Category: bug. Something is wrong. This is bad! label Oct 23, 2023
@seanmonstar
Copy link
Member

Wow, thanks so much for the write-up, and finding this deadlock! Fixing that up shouldn't be too complicated.

@seanmonstar seanmonstar added E-easy Effort: easy. A task that would be a great starting point for a new contributor. A-ffi Area: ffi (C API) labels Oct 23, 2023
@seanmonstar
Copy link
Member

Oh, you even submitted a PR already, double thanks!

seanmonstar pushed a commit that referenced this issue Oct 23, 2023
poll_next locks the driver, and also calls drain_queue while holding
that lock. Since drain_queue locks the driver too, that results in a
deadlock.

To fix, unlock the driver before calling drain_queue.

Closes #3369
0xE282B0 pushed a commit to 0xE282B0/hyper that referenced this issue Jan 12, 2024
poll_next locks the driver, and also calls drain_queue while holding
that lock. Since drain_queue locks the driver too, that results in a
deadlock.

To fix, unlock the driver before calling drain_queue.

Closes hyperium#3369
@chebbyChefNEQ
Copy link

chebbyChefNEQ commented Jan 12, 2024

. nvm

0xE282B0 pushed a commit to 0xE282B0/hyper that referenced this issue Jan 16, 2024
poll_next locks the driver, and also calls drain_queue while holding
that lock. Since drain_queue locks the driver too, that results in a
deadlock.

To fix, unlock the driver before calling drain_queue.

Closes hyperium#3369

Signed-off-by: Sven Pfennig <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi Area: ffi (C API) C-bug Category: bug. Something is wrong. This is bad! E-easy Effort: easy. A task that would be a great starting point for a new contributor.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants