Add server config option to spawn handlers detached from client disconnect behavior #701

jgallagher · 2023-06-12T15:26:01Z

Adds a new default_handler_disposition field to both DropshotConfig and ServerConfig. The default value for this (in DropshotConfig's Default impl) is HandlerDisposition::CancelOnDisconnect, which matches the current behavior of dropshot: if a client disconnects, the future returned by the endpoint handler is dropped (and therefore cancelled). If the config is instead HandlerDisposition::DetachFromClient, all endpoint handler futures are tokio::spawn()'d, and they will therefore always run to completion.

I phrased this field as default_handler_disposition under the assumption that we'd (eventually) provide a way for individual endpoints to override this choice, but am very much not wed to the naming here - any suggestions are welcome.

davepacheco

For a change like this, I think it'd be useful to test-integrate it into Omicron before we land it here. What do you think? (I'm afraid you're going to wind up undoing some changes I did recently to remove ..Default::default() from a whole bunch of ConfigDropshot structures. Sorry about that.)

For the record: I confirmed that the current behavior is expected from hyper. I wanted to confirm that because it does seem counter-intuitive to a bunch of us and if we're going so far out of our way to do things differently, I wanted to make sure we weren't just holding it wrong. (I'm not sure why it's the behavior and I remain frustrated that cancellation appears to be undocumented in the canonical Rust and async/await references.)

dropshot/src/config.rs

dropshot/src/api_description.rs

davepacheco · 2023-06-12T18:50:08Z

dropshot/src/server.rs

+                // Ignore errors from `send()`: if it fails, it's because the
+                // outer `http_request_handle` future was cancelled while
+                // waiting for this result.
+                _ = tx.send(result);


Probably a silly question, but why not use the Future's output as provided by awaiting on the spawn handle?

Given that we're not, is it at least worth warning in this case? (The answer might be "no", but it seems potentially useful to learn this.)

Come to think of it: I feel like it would be useful to to get a "warn"-level log message any time a client disconnects while we're running a request handler. What do you think? (That can definitely be deferred, too.)

Probably a silly question, but why not use the Future's output as provided by awaiting on the spawn handle?

I... have no idea. I think it's just habit to use oneshot channels like this; normally when spawning I think I'm not in a position where I can await the future directly.

I do think adding a warning here would be handy, and I think easy? I'll give it a shot.

Added logging (both here and around the .unwrap() below) in efef6f9.

dropshot/src/server.rs

davepacheco · 2023-06-12T18:58:10Z

dropshot/tests/test_config.rs

+        // We weren't cancelled: mem::forget() drop_counter so its drop impl
+        // doesn't run. This leaks a reference to our drop_count, but we're in a
+        // unit test so won't worry about it.
+        mem::forget(drop_counter);


These two new tests feel more complicated than they need to be (caveat: I'm not sure about that!). Part of it is that feels a little weird to test Drop so explicitly when it's really a proxy for some other behavior that's what we really care about. Especially given that we're using this out to bypass the normal Drop behavior.

One implementation would be to just panic at this point. Of course, that wouldn't let you test that when not cancelled it runs to completion, so I guess we could be worried that it's not panicking by accident.

How about this: what if you had a handler with two explicit counters: one bumped before receiving the message from the main task and one bumped after. I think you could use the same handler and ServerContext. And I think you could drive them the same way? You could have a helper function that's given a value for the config, constructs a server with this handler, starts it, runs one request to normal completion, then starts another request but cancels it, and then shuts down the server and waits for it to shut down. Then it could return the final counter values. Besides hopefully being simpler, an advantage would be that we know we're testing exactly the same thing in both cases.

I haven't fleshed this out any more than that so it's definitely possible there's something I'm missing. The code here looks correct. If you think what's here is a better path, that's fine with me.

I took a stab at simplifying them in 8946d44 - I didn't go with the two counters idea, but I think what I did makes what is being tested much more obvious. LMK what you think.

dropshot/tests/test_config.rs

davepacheco

One thing I mentioned above but I think got lost: I think it could be quite valuable if, when using CancelOnDisconnect we emitted a warning message when the Future was being dropped while in a request handler. This can definitely be a follow-on change (and not asking you to do that) but I figured I'd mention it. The easiest way I can think to do this is to store a bit (AtomicBool?) that starts false, gets set to true after we finish the request handler, and then create something in the same scope with a Drop handler (maybe scope_guard can help here?) that checks if the boolean is false? Seems kind of cheesy though.

CHANGELOG.adoc

davepacheco · 2023-06-14T23:09:10Z

dropshot/src/server.rs

+                    error!(request_log, "handler panicked");
+                    panic!("handler panicked before sending response");


I'm not sure how to handle this. It'd be nice to report the error in these cases (or else is it lost completely?). I'm not clear that we can because we don't have anything usefully printable.

This example is interesting:
https://docs.rs/tokio/latest/tokio/task/struct.JoinError.html#examples-1

Maybe that's the way to go?

Nice, that looks great. It's a little more work to get the error out; added in 204c111

Co-authored-by: David Pacheco <[email protected]>

jgallagher · 2023-06-14T23:48:43Z

One thing I mentioned above but I think got lost: I think it could be quite valuable if, when using CancelOnDisconnect we emitted a warning message when the Future was being dropped while in a request handler. This can definitely be a follow-on change (and not asking you to do that) but I figured I'd mention it. The easiest way I can think to do this is to store a bit (AtomicBool?) that starts false, gets set to true after we finish the request handler, and then create something in the same scope with a Drop handler (maybe scope_guard can help here?) that checks if the boolean is false? Seems kind of cheesy though.

I added a TODO to this effect, but yeah, I'm not sure how best to do this either at the moment. With the default changing it doesn't seem super urgent at the moment.

It occurred to me as I was writing this that this might be the first runtime dependency on tokio in Dropshot. I think when we wrote the bulk of Dropshot, we just assumed consumers were using tokio, but I don't know if we depended in any way on tokio being the executor. I'm not sure it's possible to use a different executor today. I don't think it's worth spending much time on this now but at some point we should probably figure out what our dependency really is and document that better.

(Lifting this up since github resolved the conversation when I took your CHANGELOG edit) I don't think that's true: we're using at least tokio::net::{TcpListener, TcpStream} and tokio_rustls; I assume both of those require a tokio executor?

davepacheco · 2023-06-14T23:54:12Z

(Lifting this up since github resolved the conversation when I took your CHANGELOG edit) I don't think that's true: we're using at least tokio::net::{TcpListener, TcpStream} and tokio_rustls; I assume both of those require a tokio executor?

That's possible. It's not obvious to me that they would.

jgallagher added 2 commits June 12, 2023 10:30

Add default_handler_disposition config option

890a4dc

Add tests for handler disposition config option

6d5dee2

jgallagher requested review from davepacheco and ahl June 12, 2023 15:26

update changelog

8073a04

davepacheco reviewed Jun 12, 2023

View reviewed changes

dropshot/src/config.rs Outdated Show resolved Hide resolved

dropshot/src/config.rs Outdated Show resolved Hide resolved

dropshot/src/config.rs Outdated Show resolved Hide resolved

dropshot/src/api_description.rs Show resolved Hide resolved

davepacheco reviewed Jun 12, 2023

View reviewed changes

jgallagher added 5 commits June 13, 2023 10:26

rename handler disposition -> handler task mode

55b3574

add logging around HandlerTaskMode::Detached failure conditions

efef6f9

greatly simplify handler task mode tests

8946d44

fix broken doc test

17b55c5

fix doc spelling

4f25eba

leftwo mentioned this pull request Jun 13, 2023

Audit crucible for holding mutex across await and other cancelation complications oxidecomputer/crucible#798

Open

jgallagher added 2 commits June 13, 2023 16:42

kebab-case serialization of HandlerTaskMode variants

909a4d7

switch default handler mode to Detached

f4e4c07

jgallagher mentioned this pull request Jun 13, 2023

Make server shutdown wait on any Detached handler futures #702

Merged

davepacheco reviewed Jun 14, 2023

View reviewed changes

jgallagher and others added 3 commits June 14, 2023 19:45

Update CHANGELOG.adoc

750f3cd

Co-authored-by: David Pacheco <[email protected]>

add TODO for logging CancelOnDisconnect handler drops

0841f64

better panic propagation for Detached

204c111

davepacheco approved these changes Jun 15, 2023

View reviewed changes

jgallagher merged commit 3a42491 into main Jun 15, 2023

jgallagher deleted the option-to-spawn-handlers branch June 15, 2023 20:46

This was referenced Jun 21, 2023

Deleted instances remain as running in CRDB after propolis zone is already removed oxidecomputer/omicron#3207

Open

transaction_async is not cancel-safe, and bedlam ensues oxidecomputer/async-bb8-diesel#47

Open

davepacheco mentioned this pull request Nov 23, 2023

ServiceZoneRequest is too general oxidecomputer/omicron#4466

Merged

davepacheco mentioned this pull request Feb 23, 2024

end-of-request USDT probes, log entries don't always happen #914

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add server config option to spawn handlers detached from client disconnect behavior #701

Add server config option to spawn handlers detached from client disconnect behavior #701

jgallagher commented Jun 12, 2023

davepacheco left a comment

davepacheco Jun 12, 2023

jgallagher Jun 12, 2023

jgallagher Jun 13, 2023

davepacheco Jun 12, 2023

jgallagher Jun 13, 2023

davepacheco left a comment

davepacheco Jun 14, 2023

jgallagher Jun 14, 2023

jgallagher commented Jun 14, 2023

davepacheco commented Jun 14, 2023

		error!(request_log, "handler panicked");
		panic!("handler panicked before sending response");

Add server config option to spawn handlers detached from client disconnect behavior #701

Add server config option to spawn handlers detached from client disconnect behavior #701

Conversation

jgallagher commented Jun 12, 2023

davepacheco left a comment

Choose a reason for hiding this comment

davepacheco Jun 12, 2023

Choose a reason for hiding this comment

jgallagher Jun 12, 2023

Choose a reason for hiding this comment

jgallagher Jun 13, 2023

Choose a reason for hiding this comment

davepacheco Jun 12, 2023

Choose a reason for hiding this comment

jgallagher Jun 13, 2023

Choose a reason for hiding this comment

davepacheco left a comment

Choose a reason for hiding this comment

davepacheco Jun 14, 2023

Choose a reason for hiding this comment

jgallagher Jun 14, 2023

Choose a reason for hiding this comment

jgallagher commented Jun 14, 2023

davepacheco commented Jun 14, 2023