Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalidate CDN caches after build #1825

Merged
merged 9 commits into from
Sep 25, 2022

Conversation

syphar
Copy link
Member

@syphar syphar commented Sep 8, 2022

This is a follow-up to #1569 and is working towards #1552.

This adds CloudFront invalidation after releases for just invalidating the whole /crate/* prefix after each build.

without any settings it won't do anything.

To keep in mind when reviewing: CloudFront invalidations can be relatively slow, up to 10-15 minutes. I'm not sure if the time depends on the amount of paths / patterns in the request, or the amount of files matching that pattern. My tests were much faster (1 minute max)

original description

This is a follow-up to #1569 and enables us to control caching for /latest/ pages too.

Similarly to versioned pages we can define a normal max-age, combined with stale-while-revalidate.

This also adds CloudFront invalidation after releases. For now the invalidation choices we have are:

  • don't invalidate anything
  • invalidate the whole /krate/* prefix
  • invalidate only the /krate/latest/* prefix.

without any settings it won't do anything, so we can slowly increase caching times and watch what happens.

To keep in mind when we play around with the settings: CloudFront invalidations can be relatively slow, up to 10-15 minutes. I'm not sure if the time depends on the amount of paths / patterns in the request, or the amount of files matching that pattern.

@github-actions github-actions bot added the S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed label Sep 8, 2022
@syphar
Copy link
Member Author

syphar commented Sep 8, 2022

@jsha this could be the next step, what do you think?

src/cache.rs Outdated Show resolved Hide resolved
@jsha
Copy link
Contributor

jsha commented Sep 8, 2022

Without having read the source yet, I like the plan.

One caveat: for correctness, it would be ideal if we could prevent caching of /krate/latest/.../*.js. That encompasses "invocation specific" files like sidebar-itemsXXX.js, source-filesXXX.js, search-indexXXX.js, and implementors/.../trait.YYY.js (where XXX is the resource suffix).

In other words, we should cache the static JS, CSS, and images (already done, for a long time). And we should cache the HTML (this PR). But we shouldn't cache invocation specific JS files, because then a fresh HTML page could get a browser-cached invocation specific JS file, and produce inconsistent/surprising behavior.

As a concrete example, assuming invocation specific JS is cached:

  1. Doc reader visits /krate/latest/krate/bar/fn.foo.html. This triggers a load of /krate/latest/krate/bar/sidebar-itemsXXX.js, which gets browser cached.
  2. Crate author publishes a release that deletes an item (fn qux) from mod bar. This does a CloudFront invalidation.
  3. Doc reader visits /krate/latest/krate/bar/fn.baz.html. This triggers a load of /krate/latest/krate/bar/sidebar-itemsXXX.js from the browser cache, so the sidebar lists qux. However, clicking on qux in the sidebar goes to a 404.

There are ways to solve this in rustdoc, for instance by mixing crate version into the filename for invocation specific files. But for now I think we should consider invocation specific JS files under /krate/latest/ to be non-cacheable, at least for the browser.

The performance impact of not caching these should be pretty low, since they are not on the critical path for page load (at least in current rustdoc).

@syphar
Copy link
Member Author

syphar commented Sep 9, 2022

@jsha perhaps I'm missing something, but if this would be an problem, wouldn't it already be an problem?

Since the resource suffix includes the nightly version (for example: https://docs.rs/itertools/latest/search-index-20211205-1.59.0-nightly-e2116acae.js), in most cases we will get a new path when there is a new release under /latest/. Of course there is an edge-case when I release more than one time on a certain day.

I also remember a discussion about this, if needed I can try to find the issue.

If we want to optimize this, one other option is to only use s-maxage, and rely on our active purge. Only issue with that is that other caches might also cache these, and we don't control them. Sadly CloudFront doesn't have Surrogate-Control like fastly, which would make this really easy to solve.

@jsha
Copy link
Contributor

jsha commented Sep 9, 2022

if this would be an problem, wouldn't it already be an problem?

Oh! I didn't think to look before, but yes, it looks like invocation-specific files (like https://docs.rs/rustls/latest/rustls/sidebar-items-20220517-1.63.0-nightly-4c5f6e627.js) currently get the cache-control: public, max-age=31104000, which means it's already a problem. Though, as you point out, the unlikelihood of doing two releases on the same toolchain with significantly differing (e.g.) sidebars is fairly low.

So, no need to address the problem I mentioned in this PR. Thanks for checking.

@jsha
Copy link
Contributor

jsha commented Sep 9, 2022

Aha, here's the issue from January: #1593

@syphar syphar added S-waiting-on-author Status: This PR is incomplete or needs to address review comments and removed S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed labels Sep 10, 2022
@syphar
Copy link
Member Author

syphar commented Sep 10, 2022

I'll add some more tests , but I would love some feedback around the general idea here

Copy link
Contributor

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks great!

src/web/rustdoc.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
@syphar
Copy link
Member Author

syphar commented Sep 18, 2022

@jsha this is ready for another round, also cc @Nemo157

I'm not 100% certain about the cdn-backend design, I'm happy to take other ideas that make testing it in the builder possible :)

I removed purging only /latest/ since it would have had some edge cases ( like rebuilds of old versions) that would have to be handled correctly. Purging the whole crate is good enough IMO.

@syphar syphar added S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed and removed S-waiting-on-author Status: This PR is incomplete or needs to address review comments labels Sep 18, 2022
@syphar syphar requested a review from jsha September 18, 2022 14:18
Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have a tokio runtime stored somewhere? Why do we need another?

@syphar
Copy link
Member Author

syphar commented Sep 18, 2022

Don't we already have a tokio runtime stored somewhere? Why do we need another?

valid concern!

We had another in the S3 storage backend. In this PR I'm moving the runtime to the context, and the S3 backend will also use the same.

@syphar syphar added S-waiting-on-author Status: This PR is incomplete or needs to address review comments and removed S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed labels Sep 19, 2022
@syphar syphar added S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed and removed S-waiting-on-author Status: This PR is incomplete or needs to address review comments labels Sep 19, 2022
@syphar
Copy link
Member Author

syphar commented Sep 23, 2022

ping @jsha @rust-lang/docs-rs anyone got time for reviewing?

Thinking longer about #1552 (comment)

I can imagine also stale-while-revalidate together with purging would work for /latest/ too, while the downside would be that the browser & other caches also respect this.

Otherwise the default TTL option with no-cache is still a safer option.

Cargo.toml Outdated Show resolved Hide resolved
src/cdn.rs Outdated Show resolved Hide resolved
src/cdn.rs Outdated Show resolved Hide resolved
src/cdn.rs Outdated Show resolved Hide resolved
src/docbuilder/rustwide_builder.rs Outdated Show resolved Hide resolved
src/docbuilder/rustwide_builder.rs Show resolved Hide resolved
@syphar
Copy link
Member Author

syphar commented Sep 23, 2022

@Nemo157 I added the requested changes

@syphar syphar requested review from Nemo157 and removed request for jsha September 23, 2022 18:41
@syphar syphar changed the title optionally set caching headers for /latest/, invalidate caches after build optionally invalidate CDN caches after build Sep 24, 2022
@syphar
Copy link
Member Author

syphar commented Sep 24, 2022

@jsha @Nemo157 fyi,

while working on #1552 (comment) I came to the conclusion that I want a different structure for defining the caching headers, so I'll remove it from this PR and will create a new one for this.

So the main piece remaining here is the CDN invalidation after build

@syphar syphar changed the title optionally invalidate CDN caches after build invalidate CDN caches after build Sep 24, 2022
Copy link
Member

@Nemo157 Nemo157 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just noticed a tiny test simplification

@syphar syphar force-pushed the latest-version-cache-and-purge branch from 0b4ed16 to 3df6127 Compare September 25, 2022 11:35
@syphar syphar merged commit b194f43 into rust-lang:master Sep 25, 2022
@syphar syphar deleted the latest-version-cache-and-purge branch September 25, 2022 11:36
@github-actions github-actions bot added S-waiting-on-deploy This PR is ready to be merged, but is waiting for an admin to have time to deploy it and removed S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed labels Sep 25, 2022
@syphar syphar removed the S-waiting-on-deploy This PR is ready to be merged, but is waiting for an admin to have time to deploy it label Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants