[WIP] Support error types other than `dropshot::HttpError` #1164

hawkw · 2024-11-05T23:41:43Z

This still needs some of documentation and API polish, but it works!

Fixes #7
Fixes #39
Fixes #41
Fixes #801

hawkw

Leaving some notes for reviewers.

hawkw · 2024-11-05T23:43:57Z

dropshot/src/server.rs

+    pub(crate) error_handler: Box<
+        dyn Fn(ErrorContext<'_, C>, ServerError) -> Response<Body>
+            + Send
+            + Sync,
+    >,


I'm not sure if this is really the nicest interface for the "unhandled dropshot-internal error" API --- open to suggestions here!

Another option I considered was having the user provide a type when constructing the server, and bounding it with IntoErrorResponse + From<ServerError>, and having the From impl for that type construct the error. That way, we could also be aware of the schema for the global catchall error...but, it's trickier to provide request metadata that way if it's just a From impl. Potentially, we could instead have this handler return a type implementing IntoErrorResponse, but we'd then have to erase that type unless we wanted the server to be generic over it..

See also comments in #1164 (comment)

hawkw · 2024-11-05T23:47:38Z

dropshot/src/error.rs

+pub use crate::router::RouterError;
+
+#[derive(Debug, thiserror::Error)]
+pub enum ServerError {


This is the top-level representation of "all errors generated from within dropshot" which the user can construct their own error responses from using the server-level error handler. Alternatively, if the error handler is not overridden, we will just convert into the stock HttpError for you by default.

I also considered calling this DropshotError --- open to being convinced either way.

Something else I considered was having a fn suggested_status_code() or something on this and the nested child errors, which would return the same status code we use when converting them to HttpErrors. This would be in order to help users construct their own error response types without having to match every possible variant of these errors. You could then use a combination of the suggested_status_code() and to_string() methods to construct a user error type, and potentially match specific variants and handle them differently.

But, this API can be added in a separate PR.

It looks like there is a fn recommended_status_code() -- is that the same thing?

hawkw · 2024-11-05T23:53:09Z

dropshot/src/error.rs

+pub trait IntoErrorResponse: JsonSchema {
+    /// Returns the HTTP status code that should be returned along with this
+    /// error.
+    fn into_error_response(
+        &self,
+        request_id: &str,
+    ) -> http::Response<crate::Body>;
+}


This is the trait that must be implemented by error types returned by endpoint functions. Initially, they were just going to be T: Serialize + JsonSchema + AsStatusCode (where AsStatusCode is a trait where the user can indicate the desired status code for the error, naturally), but I realized that we might also want to provide the request ID when turning user error types into error responses.

I'm wondering about unifying this interface with that of the server's global handler for dropshot-internal errors, but it's tricky: we need to erase the types of the errors returned by individual endpoint functions, in order to allow them to return different error types without forcing the entire server to be generic over every possible error, but we'd like the global error handler function to be able to match on the dropshot-internal errors. I thought about having this function take take the ErrorContext type used by the global error handler thingy, but that's currently generic over the ServerContext, and would therefore make this trait no longer object-safe, defeating the purpose a bit.

This now takes an ErrorContext. I'm somewhat on the fence about that type. Initially, it had the ServerContext passed in as part of it, which made it no longer object-safe and thus not usable for endpoint errors. I realized this wasn't strictly necessary as the error handler can be a closure, so if the user wants to give it some stuff like error metrics etc, they can just move that stuff into the closure.

I don't love having to construct a second RequestInfo for error handling, though. In the case of errors returned by endpoint handler functions, the endpoint already has the request, so it doesn't really need any of that stuff when constructing its errors, but that then forces the error to actually consume it when it's constructed, so you can't just ? an error from some other function and would instead have to map_err it to add context like a request ID. OTTOH, there's an issue around schemas (which we already see with HttpError); HttpError is not the actual type we serialize in error responses, but instead, a HttpErrorResponseBody type which implements Serialize + JsonSchema and includes the request ID. The late-binding of additional request metadata to errors through into_error_response taking request info feels a bit easy for a user to mess up: if you just derived Serialize + JsonSchema for the actual error struct implementing IntoErrorResponse, but what you actually serialize is some other thing that includes a request ID or other metadata, you've generated the wrong schema...

I'd love to hear other people's thoughts on this.

hawkw · 2024-11-05T23:54:58Z

dropshot/src/handler.rs

        async fn handle_request(
            &self,
            rqctx: RequestContext<Context>,
            _param_tuple: ($($T,)*)
        ) -> HttpHandlerResult
        {
            let response: ResponseType =
-                (self)(rqctx, $(_param_tuple.$i,)*).await?;
+                (self)(rqctx, $(_param_tuple.$i,)*).await.map_err(|e| HandlerError::Handler(Box::new(e)))?;


It's necessary to erase the user's error type here, so that handler functions with arbitrary error return values can coexist in the same API.

We could alternatively do that by just going ahead and serializing the error response here. However, I didn't do that, because I thought it would be nicer to be able to give back a thing that also implements fmt::Display, for when we log that the request completes with an error. Thus, the HandlerError enum.

hawkw · 2024-11-05T23:58:30Z

dropshot/src/api_description.rs

+                    .entry(endpoint.error_type.name.clone())
+                    .or_insert_with(|| schema(&mut generator));


We stuff the schemas for each endpoint's error type into a big map, so that we can put the schemas in components.responses in order to allow endpoints that return the same Rust error type to also return the same type in the OpenAPI document.

Unfortunately, though, I think just keying this by the name of the schema is probably not sufficient, because I believe schemars' derive(JsonSchema) just uses the type's name, not its fully qualified name. Which, I think, means that if you have an endpoint that returns a foo::bar::Error and another endpoint that returns a baz::quux::Error, and these are different Rust types, their schemas will clobber each other. The key for the map should probably be the fully-qualified type name including the module path, and we might need to massage that into something we can use as the reference in the OpenAPI doc.

I think this needs to be fixed before this PR is mergeable; I'm leaving this comment to get it written down while I remember.

Went and implemented this in e972811 --- it's a bit hairy because I wanted to generate reasonably nice unambiguous names for the error responses. It turns out that schemars will internally disambiguate schema names by appending a number to them, which I had missed, but it's still necessary to do our own disambiguation as well if we want to reference schemas in components.schemas from components.responses, since we still have to generate the references to them ourselves.

It would certainly be nicer if there was a way to get schemars to use my prettier unambiguous names for the schemas themselves rather than Error1, Error2, but I think this (generated for an API with methods that return latex::Error and flask::Error types) is good enough:

Generated OpenAPI document:

{ "openapi": "3.0.3", "info": { "title": "Custom Error Example", "version": "1.0" }, "paths": { "/pdflatex/{filename}": { "get": { "operationId": "get_pdflatex", "parameters": [ { "in": "path", "name": "filename", "required": true, "schema": { "type": "string" } } ], "responses": { "201": { "description": "successful creation", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Pdf" } } } }, "4XX": { "$ref": "#/components/responses/LatexError" }, "5XX": { "$ref": "#/components/responses/LatexError" } } } }, "/ye-flask": { "get": { "summary": "Gets ye flask.", "operationId": "get_ye_flask", "responses": { "200": { "description": "successful operation", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Flask" } } } }, "4XX": { "$ref": "#/components/responses/FlaskError" }, "5XX": { "$ref": "#/components/responses/FlaskError" } } } } }, "components": { "schemas": { "Error": { "description": "Errors returned by pdflatex.\n\nGood luck figuring out what these mean!", "oneOf": [ { "description": "An hbox is overfull.", "type": "object", "properties": { "OverfullHbox": { "type": "object", "properties": { "badness": { "description": "The amount of badness in the overfull hbox.", "type": "integer", "format": "uint", "minimum": 0 } }, "required": [ "badness" ] } }, "required": [ "OverfullHbox" ], "additionalProperties": false }, { "description": "Like an overfull hbox, except the opposite of that.", "type": "object", "properties": { "UnderfullHbox": { "type": "object", "properties": { "badness": { "description": "This one also has badness.", "type": "integer", "format": "uint", "minimum": 0 } }, "required": [ "badness" ] } }, "required": [ "UnderfullHbox" ], "additionalProperties": false } ] }, "Error2": { "type": "object", "properties": { "reason": { "type": "string" } }, "required": [ "reason" ] }, "Flask": { "type": "object", "properties": { "name": { "type": "string" } }, "required": [ "name" ] }, "Pdf": { "type": "object", "properties": { "filename": { "type": "string" } }, "required": [ "filename" ] } }, "responses": { "LatexError": { "description": "Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Error" } } } }, "FlaskError": { "description": "Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Error2" } } } } } } }

As I proposed in [this comment][1]. This is intended to be used by user code that constructs its own error type from Dropshot's errors, to make it easier to get the same status codes that are used by `HttpError` when the custom user error type just structures the response differently, or only wishes to override a small subset of Dropshot errors. Perhaps this should be a trait eventually --- I'm kinda on the fence about this. We may also want to do a similar "recommended headers" thing, since some error conditions are supposed to return specific headers, like which methods are allowed for a Method Not Allowed error code, or the desired protocol to upgrade to for a 426 Upgrade Required error. While I was here, I also changed some error variants to return more correct status codes --- a bunch of stuff currently just returns 400 Bad Request when it should really return a more specific error like 426 Upgrade Required etc. [1]: #1164 (comment)

hawkw · 2024-11-06T19:50:55Z

I think I'm gonna go sketch out the "recommended headers" thing I suggested in f54bca9 as well, while I'm working on the dropshot error types.

hawkw · 2024-11-06T20:38:38Z

Another question is whether we ought to generate the "global error" that's synthesized by the error handler from Dropshot errors in the OpenAPI document for each endpoint, since --- implicitly --- the endpoint can also return that error in the face of e.g. extractor errors. We could potentially generate a oneOf in the OpenAPI document that indicates that the error response can either be the endpoint-specific error or the global error type. But this would require getting rid of the error handler function and instead having the user say "this is the type I want to return, and it can be constructed from a ServerError".

If we went down that route, we could also get rid of the IntoErrorResponse trait and instead require that individual endpoints' error types implement Into<{the global error type}>. This could be useful for cases where you want to wrap endpoint-specific errors in the same metadata everywhere. But, it has the disadvantage of making the OpenAPI document no longer indicate that some errors are returned only by specific endpoints --- instead, every error is a member of every endpoint. Which I really dislike, personally: it would be nice to be able to generate clients that return specific error types depending on which API is called.

davepacheco

Thanks for taking this on. I think this is a great direction but as you said there are some important details still to work out. I have some specific comments below. I'm interested in @ahl's take on the OpenAPI bits.

My biggest questions are around what this looks like for the end user, which I kind of lost track of in all the low-level details here. It'd help to write up the end-user documentation (which, for better or worse, I think the rest of Dropshot puts into the crate-level docs) and the CHANGELOG entry. I think it's also worth spelling out somewhere (even if it's just in a PR comment) how this looks in the OpenAPI doc and maybe generated clients (if it's not obvious).

This is definitely one of those changes where we'll want to do a test-upgrade of Omicron and make sure it does what we think.

When this is closer to ready, I'll be curious if it will address @jclulow's goals, too.

This probably sounds like a lot of work. These aren't all blockers. But this is a pretty substantive change to both the Rust API and the resulting specs and it's important to get right. On the plus side, once we do, this is going to be a really nice improvement to a longstanding pain point!

davepacheco · 2024-11-08T17:04:33Z

dropshot/src/lib.rs

@@ -742,7 +742,7 @@ mod dtrace;
 mod api_description;
 mod body;
 mod config;
-mod error;
+pub mod error;


Are you thinking users would use this as dropshot::error::Something? We could organize stuff that way but it seems different from what we've done with everything else in Dropshot.

I'm fine with changing this to be not-pub and reexporting everything at the root level!

davepacheco · 2024-11-08T17:18:37Z

dropshot/src/extractor/mod.rs

+/// Errors returned by extractors.
+///
+/// Because extractors can be combined (a tuple of types which are extractors is
+/// itself an extractor), all extractors must return the same error type. Thus,
+/// this is an enum of all the possible errors that can be returned by any
+/// extractor.


I'm not sure about this approach here. These are just the extractors that Dropshot provides. But consumers can define their own, too.

Also, it seems like a lot of the actual error cases probably overlap (e.g., you can have "invalid input" in a typed body, path param, query param, etc. and it's not clear a consumer should have to handle these all separately).

nit: it seems like this belongs in dropshot/src/extractor/common.rs (or a new file) -- mod.rs is just a thing that includes other files and "common" is where we put the common stuff here.

I'm not sure about this approach here. These are just the extractors that Dropshot provides. But consumers can define their own, too.

So, one thought is that we could provide an Other(Box<dyn Error + Send + 'static>) variant for user-implemented extractors? I hadn't realized that the extractor traits were meant to be publicly implemented.

Also, it seems like a lot of the actual error cases probably overlap (e.g., you can have "invalid input" in a typed body, path param, query param, etc. and it's not clear a consumer should have to handle these all separately).

So, I also wondered about changing these to be more about the semantic error than the specific extractor that generated it. This approach is partially because some of the extractor errors represent things that have recommended HTTP semantics (e.g. a missing or invalid Content-Type header is supposed to return a 415 Unsupported Media Type status). But, I think we can just special-case those as part of a single ExtractorError that's shared across all extractors. I'll rework this a bit!

I'm not sure about this approach here. These are just the extractors that Dropshot provides. But consumers can define their own, too.

So, one thought is that we could provide an Other(Box<dyn Error + Send + 'static>) variant for user-implemented extractors? I hadn't realized that the extractor traits were meant to be publicly implemented.

That seems like it would technically work but feels like second-classing them a bit. The two things I mentioned combined made me think that organizing these variants by the type of extractor isn't great. That makes me want to go down the other path you mentioned, organizing this more semantically. But did you go down that path and find it problematic, too?

Not exactly, I think organizing the error type variants semantically is fine. What I meant is just that I think it's important that some extractor errors have first class representations (e.g. being enum variants that are easy to match on) because they have specific recommended response status codes/headers. This includes content-type errors and the websocket errors. Basically everything else can probably be grouped into a generic 400 Bad Request + message, or a generic 500 Internal Server Error.

I think the real question becomes: "beyond those errors which we know ought to have specific response statuses etc, what categories of errors might users want to programmatically match on?". Pretty much every error that we return that isn't content type related is "some kind of parse error", so we could have an InvalidInput variant or something, and perhaps there should be an internal error variant for "the extractor is broken". I'm on the fence as to whether those variants ought to be represented by Box<dyn Error> or by dropshot's HttpError...

For the Dropshot-provided errors, I think the two use cases are:

programmatic consumers will treat them either like generic network-like errors (similar to a 500) or an unhandleable error that shouldn't happen (like a 400). Their real goal from this work is being able to get a typed error for the errors returned by handlers, not the Dropshot ones.

the other main consumer is people who want to serve a custom web page to a browser (@jclulow wants this and I think @david-crespo has too). I'm not sure if they need much programmatic information about what went wrong. (They probably do want, like, the URL that was requested, though -- they might want to redirect to something that depends on it.)

Maybe the way to go is to start pretty generic (similar to HttpError: status code, internal message, external message) and extend it based on what people then ask for.

Yeah, that makes sense; I'm inclined to potentially still represent it as an enum internally because I'd like some of the error variants to also be able to provide a "recommended headers" interface lazily, but I think it seems reasonable to wrap an enum behind an opaque type that just says "here's my recommended status code, internal/external message, and a method that adds headers associated with the error to a HeaderMap.

davepacheco · 2024-11-08T17:19:39Z

dropshot/examples/colliding-error-type-names.rs

+    #[endpoint {
+    method = GET,
+    path = "/pdflatex/{filename}",
+}]


nit: formatting is weird here

davepacheco · 2024-11-08T17:21:14Z

dropshot/examples/colliding-error-type-names.rs

+    /// Errors returned by pdflatex.
+    ///
+    /// Good luck figuring out what these mean!


This might just be me but when reading this I found myself confused about what any of this stuff is before I realized that these are just opaque tokens you might know if you'd fought pdflatex recently. Take it or leave it but it might be clearer if the example came from the space that dropshot consumers are likely to know already.

Sure, I'm happy to change the example, or change the docs to make it clearer that I just picked random error strings from my memory, as the errors themselves were supposed to be context-free. I'd be happy to contrive an example that's actually doing some kind of (basic) validation, or something; I started by sketching out examples mostly as a way to ensure that code using the new APIs compiled, and I think it's not a bad idea to go back and make them more useful for actually learning how to use the APIs.

davepacheco · 2024-11-08T17:22:17Z

dropshot/examples/colliding-error-type-names.rs

+        ) -> http::Response<dropshot::Body> {
+            http::Response::builder()
+                .status(http::StatusCode::BAD_REQUEST)
+                .header("x-request-id", ctx.request_id)


I see what you were saying -- yeah it really seems like we want Dropshot to do this rather than expect everybody will remember to do this stuff.

davepacheco · 2024-11-08T17:35:22Z

dropshot/src/server.rs

+    >,
+}
+
+impl<C: ServerContext + std::fmt::Debug> std::fmt::Debug for DropshotState<C> {


nit: could use DebugIgnore if we're just trying to ignore the new field?

good call, didn't realize there was a dependency on that

davepacheco · 2024-11-08T17:35:50Z

dropshot/src/server.rs

+    /// Determines how to construct responses for server errors.
+    // Since most of the complex type here is the `dyn Fn` trait, we can't
+    // easily simplify it by just stuffing it in a type alias --- we'd have to
+    // add our own trait that's implemented for `T: Fn(ErrorContext<'_, C>,


Out of curiosity, why not just make this a trait object? That seems like a clean way to let consumers provide what they want here.

What do you mean? The current approach is a trait object --- it's a Box<dyn Fn(ErrorContext<'_>, ServerError) -> Response<Body> + Send + Sync>. The comment is regarding how clippy dislikes the type of the boxed trait object; we could give the type a shorter name by introducing a trait with a blanket impl (impl<T> ErrorHandler for T where T: Fn(...) -> ...)) and changing the type to Box<dyn ErrorHandler + Send + Sync>, but I didn't love the idea of introducing a trait with a blanket impl just to make the clippy warning go away.

Sorry, what I meant was defining a user-facing trait called ErrorResponseSerializer or something like that with one method which has the same signature as this function and then this would just be Box<dyn ErrorResponseSerializer>. I wouldn't do it for the clippy warning but because it seems a little cleaner. (I don't feel strongly about it.)

Yeah, that was what I was saying I didn't do in the comment. Broadly, my preference has been to avoid such "trait aliases" (traits that are blanket implemented for all T: SomeOtherTrait) in public APIs, since I've seen folks get confused a lot by APIs that expect some trait that you can't actually implement and can only implement by implementing another trait. But, I'm open to being convinced otherwise, if you think that it's better!

I don't mean a trait alias with a blanket impl for functions with this signature. I mean that the consumer would use their own struct MySerializer and explicitly impl ErrorResponseSerializer for MySerializer and impl the trait's function and then they'd pass that struct into the builder that constructs the Dropshot server.

Oh, I see what you mean. Hm, yeah, that could also work.

davepacheco · 2024-11-08T17:40:00Z

dropshot/src/error.rs

+pub use crate::router::RouterError;
+
+#[derive(Debug, thiserror::Error)]
+pub enum ServerError {


It looks like there is a fn recommended_status_code() -- is that the same thing?

davepacheco · 2024-11-08T17:40:53Z

dropshot/src/error.rs

+pub use crate::http_util::PathError;
+pub use crate::router::RouterError;
+
+#[derive(Debug, thiserror::Error)]


It could be because I came to this file last, but I found it easy to get lost in the new error hierarchy. Some doc comments on these types (and maybe a Big Theory Statement about all this?) would probably be helpful.

Yup, definitely agree. I haven't written docs for most of the new stuff yet because I wanted to make sure the API direction was broadly right before spending a bunch of time documenting it, but I'm sorry if I made that hard to review!

davepacheco · 2024-11-08T17:44:18Z

dropshot/src/lib.rs

@@ -742,7 +742,7 @@ mod dtrace;
 mod api_description;


(can't put this comment where it belongs)

I think the crate-level docs need an update to describe the new interface. This is the primary documentation for users about how to define an endpoint and what the restrictions and behaviors are.

davepacheco · 2024-11-08T18:00:09Z

Two more thoughts:

It's worth noting that the changes to the Extractor traits are breaking changes for consumers. That may be unavoidable, given the goals.
Conceivably we could separate some of this into smaller PRs that we could land right away. I'm thinking of the parts that use structured errors for the Router and some other request handling paths. I initially thought that would make the interesting bits of this PR easier to focus on but as I write this I think most of the structured errors are in areas that are tricky (the extractors) or related to this PR (the handler errors) so there may not be that much that could be separated here.

hawkw added 7 commits November 4, 2024 10:12

sketch example (won't compile)

b937d69

enum error types for extractors

a6d4113

mostly working but with some unfinished bits

366002e

more traits to fix request id thing

4c1bfac

fixup some stuff

66e115e

mostly works, needs API polish + OpenAPI stuff

1f6cd7b

generate schemas for user-defined errors

afb3475

hawkw commented Nov 5, 2024

View reviewed changes

hawkw added 4 commits November 6, 2024 09:36

disambiguate colliding error response type names

e972811

clippy placation

b19f539

tweak APIs

507dc18

fix examples

78ecabf

hawkw requested review from ahl, davepacheco and david-crespo November 6, 2024 18:11

hawkw added 8 commits November 6, 2024 10:24

TRYBUILD=overwrite

aae766e

oh there's expectorate too

ca3c8fa

keep the http header errors even though they break Clone/Eq

0a8c2c5

remove inadvertently-added hyphen

dd3cc93

forgot doctests don't run under nextest

3c6bcb7

keep internal errors internal

2b06b58

update tests to expect new statuses

860ad38

rustfmt etc

ed8614b

hawkw mentioned this pull request Nov 7, 2024

router: reduce unnecessary String allocations #1165

Open

davepacheco reviewed Nov 8, 2024

View reviewed changes

		.entry(endpoint.error_type.name.clone())
		.or_insert_with(\|\| schema(&mut generator));

[WIP] Support error types other than dropshot::HttpError #1164

Are you sure you want to change the base?

[WIP] Support error types other than dropshot::HttpError #1164

Conversation

hawkw commented Nov 5, 2024 • edited Loading

hawkw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hawkw Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

hawkw commented Nov 6, 2024

hawkw commented Nov 6, 2024 • edited Loading

davepacheco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davepacheco commented Nov 8, 2024

[WIP] Support error types other than `dropshot::HttpError` #1164

[WIP] Support error types other than `dropshot::HttpError` #1164

hawkw commented Nov 5, 2024 •

edited

Loading

hawkw Nov 6, 2024 •

edited

Loading

hawkw commented Nov 6, 2024 •

edited

Loading