-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination #14
Pagination #14
Conversation
Regarding the openapi issue that I mentioned: take the example of "pagination-multi.rs":
I modified the example to dump the openapi spec:
and here's what it looks like: {
"openapi": "3.0.3",
"info": {
"title": "example",
"contact": {},
"version": "1.0.0"
},
"paths": {
"/projects": {
"get": {
"description": "API endpoint for listing projects\n\n This implementation stores all the projects in a BTreeMap, which makes it\n very easy to fetch a particular range of items based on the key.",
"operationId": "example_list_projects",
"parameters": [
{
"in": "query",
"name": "page_params",
"description": "Specifies either how the client wants to begin a new scan or how to\n resume a previous scan. This field is flattened by serde, so see the\n variants of [`WhichPage`] to see which fields actually show up in the\n serialized form.",
"required": true,
"schema": {
"type": "string"
},
"allowReserved": true,
"style": "form"
},
{
"in": "query",
"name": "limit",
"description": "Optional client-requested limit on page size. Consumers should use\n [`RequestContext::page_limit()`] to access this value.",
"required": true,
"schema": {
"type": "string"
},
"allowReserved": true,
"style": "form"
}
],
"responses": {
"200": {
"description": "TODO: placeholder",
"content": {
"application/json": {
"schema": {
"description": "Client's view of a page of results from a paginated API",
"type": "object",
"properties": {
"items": {
"description": "list of items on this page of results",
"type": "array",
"items": {
"$ref": "#/components/schemas/Project"
}
},
"next_page": {
"description": "token to be used in pagination params for the next page of results",
"type": "string"
}
},
"required": [
"items"
]
}
}
}
}
}
}
}
},
"components": {
"schemas": {
"Project": {
"description": "Item returned by our paginated endpoint\n\nLike anything returned by Dropshot, we must implement `JsonSchema` and `Serialize`. We also implement `Clone` to simplify the example.",
"type": "object",
"properties": {
"mtime": {
"type": "string",
"format": "date-time"
},
"name": {
"type": "string"
}
},
"required": [
"mtime",
"name"
]
}
}
}
} There are a couple of problems here:
|
I'm pretty sure the openapi oddness is related to some shortcuts I took in the parameter processing code. I think we're likely falling into this case: https://github.com/oxidecomputer/dropshot/blob/master/dropshot_endpoint/src/lib.rs#L186 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beautiful.
Thinking about ways we could potentially make the interface simpler:
- Maybe have
first_page
andnext_page
closures passed to some central interface rather than the endpoint handling it itself? - push the
limit()
call on the iterator into thenew_with_paginator()
method?
Thanks for taking a look! |
I just pushed a pretty big set of changes based on the feedback and more ideas I had to simplify the interface. I don't think there's anything controversial here. Below is a rough summary of changes between commits f4d5e85..fa8865a On the input side:
On the output side:
Other stuff:
What's still remaining:
There's other stuff I'm not planning to do now, but I will file issues about:
@ahl do you want to take another look? |
I'll be interested to know if post-merging with #16 this changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that consumers should be able to return their own structure that might include the list of returned items and a page token. (For example, they might want to include other metadata like the total number of results.) That's when I realized that we shouldn't conflate the HttpResponseTyped type with the actual data returned. So I got rid of HttpResponseOkPage altogether and moved that functionality to a new type called ResultsPage. Now, you return an HttpResponseOkObject<ResultsPage>. If you want, you can embed ResultsPage in your own type that has more stuff around it (and I've tested that). This also eliminated the awkwardness that HttpResponseOkPage was the only one of the HttpResponseTyped structs that had a constructor.
This makes sense, but I had been hoping to use that special return type as an easy way to statically check if we were dealing with a paginated endpoint. I was hoping to add an OpenAPI extension to the spec output and then modify a language binding generator (e.g. for JS and Rust) to automatically generate iterators.
Statically, I think we've moved to these endpoints being a little harder to identify. We could still look at the type parameter for HttpResponseOkObject
but ResultsPage
could be embedded. I guess we can look for Query<PaginationParams<_,_>>
.
I think it might be nice to be able to provide some sort of compile-time check for these paginated endpoints to confirm that inputs and outputs jived.
To that end: This is off-the-cuff, so may be full of holes. I'm not 100% where this would get us, but I want to write it down so I don't lose it:
- Make
ResultsPage
parameterized onResultsPage<ItemType, PageSelector>
- Have a custom serializer for the token field that would do the stringificating
- Change ResultsPage::new to look like this
impl<ItemType, PageSelector> ResultsPage<ItemType, PageSelector> {
pub fn new<ScanParams>(
items: Vec<ItemType>,
scan_params: &ScanParams,
) -> Result<ResultsPage<ItemType, PageSelector>, HttpError>
where
PageSelector: From<(ItemType, ScanParams>
{
There seems to be a common pattern of the ExScanParamsIncoming
turning into ExScanParams
(with the difference being Optional<..>
). Is there something clever we can do to obviate the need for that?
* pag_params: Query<PaginationParams<MyScanParams, MyPageSelector>>, | ||
* extra_params: Query<MyExtraQueryParams>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why this works, but this strikes me as odd. I imagined that each of Query
, Path
and Json
(body) would appear at most once. I get why we're doing it, but 1. I wonder if we want to support this and 2. I think we'd have issues with more than 3 Extractors at present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely see how it's a little strange. On the other hand, it would seem a little strange to me if it didn't work -- that would imply that extractors somehow mutate the request or are otherwise intertwined instead of being self-contained things that construct a higher-level view of something in the request.
With the serde bug, I don't know another way to support taking non-pagination query parameters, which seems pretty useful.
I thought about whether we can avoid parsing the querystring twice. Unfortunately, we don't know until we've parsed the querystring whether it's the WhichPage::First
case, in which case all of the query parameters should be deserialized into the user's struct. But once we've parsed it to determine we're in the WhichPage::First
case, we don't have a way to take the representation we've got and parse it into the user's type. The only way we have to get to the user's struct is to parse the querystring again. Alternatively, we could define an intermediate representation (e.g., BTreeMap<String, String>
) and define a serde Deserializer for that representation that can reconstitute the user's type, but this doesn't feel worth the work right now just to avoid a second parse.
debug: Option<bool>, | ||
} | ||
|
||
/* TODO-coverage check generated OpenAPI spec */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fingers crossed!
Consumers can do a few things:
They basically all seem fine and also imperfect. I think (1) opens the door to various runtime bugs. (2), (3), and (4) all get the job done, at the cost of some extra boilerplate, though that code has to exist somewhere. I like (4) best, and I can do that in the example if that seems better. |
I think that's right. The only thing we'd know about them now is that they take a
Yeah, I'm interested in this as well. I tried a bunch of things (described below for future context) and went in a bunch of circles as I tried to validate them against my design goals. There may well be a way to do this with less boilerplate and/or better compile-time checking, and we should definitely iterate as we find better ways to do it. From talking offline, we'll defer this for now. The rest of this describes my goals and some of the things I tried for reference if we revisit this. Some goals:
My early intuition was that the page of results could be parametrized by the ScanMode and PageSelector types so that you could tell statically if, say, an endpoint told Dropshot to generate a page selector that was different than the one it accepted on input. Similarly, it seems like there's some kind of checking you could do to make sure the page selector makes sense for the scan mode. The very first version I had looked similar to what you describe: This was before the interface supported multiple different scan modes, so I don't think this necessarily helps match up the input and output unless there's something to make sure that the PageSelector incoming is the same one you use in the that was just the combination of ScanMode, PageSelector, and Item types. When you called Something I really didn't like about this approach was that you need a new |
I've pushed a few more changes: one adds automated tests for most of the surface area here, and the other replaces the My plan next is to sync up, retest, potentially add some more openapi tests, and wrap up. |
This PR is a very rough draft implementation of pagination for Dropshot. Briefly: pagination is the API design pattern by which API endpoints (e.g., HTTP resources) that list large collections of objects limit the number of items you can list at a time, forcing clients to make several requests to fetch the complete list. There are lots of good reasons to do this. These are documentation in dropshot/src/pagination.rs under the heading "Background: Why paginate HTTP APIs?".
This has taken a while and it's a lot more complex than I expected. There's definitely more work to be done, but before I go much deeper, I'd like some feedback on the Rust API. Whether it's a thumbs-up, thumbs-down, or concrete suggestions for improvement, I'd like to know what you think!
An important opening question is: why put any of this in Dropshot at all? As far as I can tell, other popular Rust frameworks don't have pagination as a first-class thing. The current Oxide control plane prototype implements paginated resources without any specific support from Dropshot. But this leads to a few issues:
Now, at the REST level, there are many different ways of doing pagination. These too are discussed in the block comment in dropshot/src/pagination.rs, under the heading "Patterns for pagination." There's also a range of complexity needed by clients: the simplest case is a collection of items with a single field you can sort by (e.g., "id", a unique integer, or "name", a unique string). A more complicated case would be a collection of items with several combinations (e.g., "name", a unique string, in either ascending or descending order; or a combination of "mtime" and "name"). I think these more complicated cases are likely more common than they sound.
That background aside, here's how I'd probably go about looking at this. In this PR I've implemented two pagination APIs: a general one and a simpler "marker" one.
examples/pagination-marker.rs
. This uses the "marker" interface to paginate a collection of Projects.examples/pagination-multi.rs
. This is an example of the more complicated case. The documentation at the top of this file also goes into detail about how everything looks to an HTTP client (e.g., the query params and responses).examples/pagination-multi-resources.rs
. This is a more realistic example of an API with multiple resources with similar pagination endpoints. This use-case exposed some ugliness in the previous revisions of this interface.examples/pagination-basic.rs
(again, sorry for the dubious naming here). This uses the general API to paginate in the same way as the marker example does. I included this to show what the simple one is doing under the hood.If you want, check out
dropshot/src/pagination.rs
to see how most of it's implemented. There are also some bits indropshot/src/handler.rs
.Here are my own thoughts about this right now:
I'm interested in any and all feedback, especially on those questions. I really want this to be something easy for people to reach for whenever they build an API that returns a list of things so that pagination becomes the default. But I also want to ensure as much as possible that the resulting APIs are correct, robust, use scalable patterns, etc.
History and status:
2020-07-16 (commit 1791d7a): initial PR
2020-07-17 (commit 1e91d3c):
page_selector_for
from thePaginatedResource
trait. Instead, callers pass a callback where this would go. This seems much more convenient for both simple and more complicated consumers.2020-07-17 (commit f4d5e85): Remove
PaginatedResource
altogether. This removed some boilerplate from the more complicated examples.2020-07-31 (commit fa8865a): Lots of changes, largely to simplify the interface while also making it more flexible. See my comment on the changes.
2020-08-05 (commit 9d6347b): Added automated testing and removed
Incoming
pattern from examples.2020-08-05 (commit 7ba2ee2): Sync'ed up with "master" branch (force push, unfortunately), added an openapi test, and fixed style.