Implement seek-based pagination for filterless sortless crates endpoint #3648
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements seek-based pagination for parts of the
/api/v1/crates
endpoint, and adds the infrastructure to parseseek=
parameters in paginated APIs. The reason for this PR is that we have multiple crawlers scraping the whole/api/v1/crates
endpoint every day, and offset-based pagination is not feasible after a high number of pages.To keep the implementation simple seek-based pagination is only supported for requests matching these constraints:
SELECT COUNT(*) FROM crates;
) does not work. This can be lifted in the future by refactoring how we gather the total number of results.($param, id)
, not just$param
, to ensure the results are consistently ordered (there could be multiple crates with the same download count).When the endpoint is called with a request that supports seeking, no
meta.prev_page
will be provided andmeta.next_page
will include theseek
parameter instead of thepage
parameter.meta.total
will continue to be provided as usual.The value of
seek
is a base64-encoded JSON value. The reason base64 is used is to signal to our API consumers the seek value is an implementation detail and they shouldn't manually create one. This is what other big providers do (for example GitHub). The reason to encode the value in JSON is to support more complex seek keys in the future: right now we're just encoding a bare number, and the JSON representation is the same as the ASCII representation.This PR is fully backward compatible: if clients just rely on
meta.next_page
they will transparently start using seek-based pagination, but offset-based pagination still works. When thepage=
attribute is provided offset-based pagination will be forced, andmeta.next_page
will includepage
instead ofseek
. This way, clients manually creating?page=
URLs will not break.One unfortunate aspect of this PR is that we still have to provide
meta.total
, as that's required by the Cargo registries API. Getting the total count requires a full scan over the table, which right now in production is done using an index-only scan in ~20ms.The PR is best reviewed commit-by-commit.
r? @jtgeibel