Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement seek-based pagination for filterless sortless crates endpoint #3648

Merged
merged 3 commits into from
May 27, 2021

Conversation

pietroalbini
Copy link
Member

This PR implements seek-based pagination for parts of the /api/v1/crates endpoint, and adds the infrastructure to parse seek= parameters in paginated APIs. The reason for this PR is that we have multiple crawlers scraping the whole /api/v1/crates endpoint every day, and offset-based pagination is not feasible after a high number of pages.

To keep the implementation simple seek-based pagination is only supported for requests matching these constraints:

  • No filters: when the request is filtered the total results count is not the same as the total number of crates, so the current implementation to gather the number of results (SELECT COUNT(*) FROM crates;) does not work. This can be lifted in the future by refactoring how we gather the total number of results.
  • No sorting other than alphanumeric: this is just not implemented yet, we'd need to change the condition used to seek to the correct position. Note that for anything that's not alphanumeric we'll need to sort by ($param, id), not just $param, to ensure the results are consistently ordered (there could be multiple crates with the same download count).
  • No search queries: search queries complicate the SQL query quite a bit, introducing relevance sorting and hardcoding the exact match as the first result. Not worth implementing for the time being IMO.

When the endpoint is called with a request that supports seeking, no meta.prev_page will be provided and meta.next_page will include the seek parameter instead of the page parameter. meta.total will continue to be provided as usual.

The value of seek is a base64-encoded JSON value. The reason base64 is used is to signal to our API consumers the seek value is an implementation detail and they shouldn't manually create one. This is what other big providers do (for example GitHub). The reason to encode the value in JSON is to support more complex seek keys in the future: right now we're just encoding a bare number, and the JSON representation is the same as the ASCII representation.

This PR is fully backward compatible: if clients just rely on meta.next_page they will transparently start using seek-based pagination, but offset-based pagination still works. When the page= attribute is provided offset-based pagination will be forced, and meta.next_page will include page instead of seek. This way, clients manually creating ?page= URLs will not break.

One unfortunate aspect of this PR is that we still have to provide meta.total, as that's required by the Cargo registries API. Getting the total count requires a full scan over the table, which right now in production is done using an index-only scan in ~20ms.

The PR is best reviewed commit-by-commit.
r? @jtgeibel

This will allow in the future to have code like:

```rust
PaginationOptions::builder()
    .enable_seek(true)
    .gather(req)?;
```
Copy link
Member

@jtgeibel jtgeibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳 This looks great! Just a few minor comments, then feel free to r=me.

src/controllers/helpers/pagination.rs Outdated Show resolved Hide resolved
src/controllers/helpers/pagination.rs Outdated Show resolved Hide resolved
@pietroalbini
Copy link
Member Author

@bors r=jtgeibel

Addressed review comments!

@bors
Copy link
Contributor

bors commented May 27, 2021

📌 Commit 5cfdbca has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented May 27, 2021

⌛ Testing commit 5cfdbca with merge f70ae1e...

@bors
Copy link
Contributor

bors commented May 27, 2021

☀️ Test successful - checks-actions
Approved by: jtgeibel
Pushing f70ae1e to master...

@bors bors merged commit f70ae1e into rust-lang:master May 27, 2021
@pietroalbini pietroalbini deleted the seek-crates branch May 27, 2021 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants