Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto merge of #3648 - pietroalbini:seek-crates, r=jtgeibel
Implement seek-based pagination for filterless sortless crates endpoint This PR implements seek-based pagination for parts of the `/api/v1/crates` endpoint, and adds the infrastructure to parse `seek=` parameters in paginated APIs. The reason for this PR is that we have multiple crawlers scraping the whole `/api/v1/crates` endpoint every day, and offset-based pagination is not feasible after a high number of pages. To keep the implementation simple seek-based pagination is only supported for requests matching these constraints: * **No filters:** when the request is filtered the total results count is not the same as the total number of crates, so the current implementation to gather the number of results (`SELECT COUNT(*) FROM crates;`) does not work. This can be lifted in the future by refactoring how we gather the total number of results. * **No sorting other than alphanumeric:** this is just not implemented yet, we'd need to change the condition used to seek to the correct position. Note that for anything that's not alphanumeric we'll need to sort by `($param, id)`, not just `$param`, to ensure the results are consistently ordered (there could be multiple crates with the same download count). * **No search queries:** search queries complicate the SQL query quite a bit, introducing relevance sorting and hardcoding the exact match as the first result. Not worth implementing for the time being IMO. When the endpoint is called with a request that supports seeking, no `meta.prev_page` will be provided and `meta.next_page` will include the `seek` parameter instead of the `page` parameter. `meta.total` will continue to be provided as usual. The value of `seek` is a base64-encoded JSON value. The reason base64 is used is to signal to our API consumers the seek value is an implementation detail and they shouldn't manually create one. This is what other big providers do (for example GitHub). The reason to encode the value in JSON is to support more complex seek keys in the future: right now we're just encoding a bare number, and the JSON representation is the same as the ASCII representation. This PR is fully backward compatible: if clients just rely on `meta.next_page` they will transparently start using seek-based pagination, but offset-based pagination still works. When the `page=` attribute is provided offset-based pagination will be forced, and `meta.next_page` will include `page` instead of `seek`. This way, clients manually creating `?page=` URLs will not break. One unfortunate aspect of this PR is that we still have to provide `meta.total`, as that's required by the Cargo registries API. Getting the total count requires a full scan over the table, which right now in production is done using an index-only scan in ~20ms. The PR is best reviewed commit-by-commit. r? `@jtgeibel`
- Loading branch information