Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow identification of which speculation rules triggered a speculation #336

Open
domenic opened this issue Sep 12, 2024 · 7 comments
Open

Comments

@domenic
Copy link
Collaborator

domenic commented Sep 12, 2024

A strategy we're seeing deployed more frequently recently is for platforms to add very broad speculation rules, and then use server responses to avoid speculation in cases that the platform is unsure is safe.

However, in some cases the website knows more than the platform it is running on, and is able to guarantee that speculations are safe. In this case the platform's server responses can interfere with the website's own speculation rules.

A possible solution to this would be to add an identifier to the speculation rules, which is sent along with any speculative load HTTP request. The platform can then only reject speculations which come from the platform's speculation rules, but let through any that come from the website.

As one possible API, we could add a top-level key "tag": "any-string" to the speculation rules, and include this information in the HTTP request, e.g. as Sec-Speculation-Rules-Tags: "any-string".

(I thought at first it would be more natural to include this with Sec-Purpose, e.g. as Sec-Purpose: prefetch;tag="any-string". But structured headers don't do nested lists very well.)

Spelling out the whole scenario in that case, we would have something like:

Platform speculation rules

{
  "tag": "awesome-platform",
  "prefetch": [
    {
      "eagerness": "conservative",
      "source": "document",
      "where": { "href_matches": "/*", "relative_to": "document" }
    }
  ]
}

Site speculation rules

(The site doesn't add a tag)

{
  "prefetch": [
    {
      "eagerness": "moderate",
      "source": "document",
      "where": { "href_matches": "/*", "relative_to": "document" }
    }
  ]
}

Flow before this proposal

The user hovers their mouse over a link to /somewhere. This sends the server

Sec-Purpose: prefetch

The platform running the server doesn't know whether /somewhere is safe to prefetch, so it responds with an HTTP error code, e.g. a 503. No speculative loading happens. Sad.

Flow after this proposal

The user hovers their mouse over a link to /somewhere. This sends the server

Sec-Purpose: prefetch
Sec-Speculation-Rules-Tags: null

The platform running the server doesn't know whether /somewhere is safe to prefetch. But it notices that in the Sec-Speculation-Rules-Tags header, a value that is not "awesome-platform" is present: in other words, this speculation was initiated by something besides Awesome Platform's speculation rules feature. So, it lets the speculation through. Yay!

@tunetheweb
Copy link

See also #298 for other use cases for this.

@aseure
Copy link

aseure commented Sep 12, 2024

We would indeed be interested in such feature. We currently have to rely on detecting Sec-Purpose: prefetch header as well as many other checks to confirm that the prefetch request should be rejected/503. If we had such a tag in the Speculation Rules our platform is injecting, we could simply reduce those checks with a second header check.

@SulemanAhmadd
Copy link

I would like to support this use-case from Cloudflare side (as mentioned by @aseure). We want to allow customers to override our rules if they believe the speculative request (either prefetch or prerender) is safe and should reach their origin server. This approach helps ensure we respect their preferences on per-page basis. We should consider a solution that can differentiate speculative requests for both in-line and in header-based speculation rules. The suggested additional tag addresses this need, but we must ensure that it doesn’t contribute to client fingerprinting due to reflecting the arbitrary string value from the client (especially for cross-origin requests).

@jeremyroman
Copy link
Collaborator

My thinking was very similar to this, though I think this should be a property of the individual rule, as this allows you to distinguish rules which are differently eager or differently aggressive, even if they're in the same ruleset.

{
  "prefetch": [
    {
      "eagerness": "conservative",
      "source": "document",
      "where": { "href_matches": "/*", "relative_to": "document" },
      "tag": "awesome-platform"
    }
  ]
}

There's also an interesting question about the case where a speculation is possible due to multiple different rules (or rule sets) with different tags (e.g., a CDN and the site both have rules that permit a speculation). One option would be to look for all of the candidates' tags that are possible and send all of them (including a placeholder for no tag); another would be to establish some kind of priority system among them.

I don't follow what you mean by "[not doing] nested lists very well". I don't feel strongly but Sec-Purpose: prefetch; speculation-rules-tags=("awesome-platform") doesn't seem bonkers. Fine with a separate header field, too, though.

Would these tags only be sent same-origin (or same-site), or would we send them to all origins (since speculations can, in general, be to any origin). I suspect we'd lean toward not sending them cross-site to reduce the possibility of its use as a tracking vector, even though URL decoration exists. I'm sure that's unfortunate for some uses, but it seems fine for the immediate ones.

@domenic
Copy link
Collaborator Author

domenic commented Sep 17, 2024

Great to hear that there's interest in this!!

The suggested additional tag addresses this need, but we must ensure that it doesn’t contribute to client fingerprinting due to reflecting the arbitrary string value from the client (especially for cross-origin requests).

Could you say more about the threat model here? Is it along the lines of what @jeremyroman mentioned, in that it doesn't contribute to fingerprinting in itself, but is a possible additional cross-site communications channel which could be used to pass along fingerprint information gathered elsewhere?

My thinking was very similar to this, though I think this should be a property of the individual rule, as this allows you to distinguish rules which are differently eager or differently aggressive, even if they're in the same ruleset.

Good point! (Although maybe we could cascade from the top level for convenience?)

One option would be to look for all of the candidates' tags that are possible and send all of them (including a placeholder for no tag);

That was my original thinking.

I don't follow what you mean by "[not doing] nested lists very well". I don't feel strongly but Sec-Purpose: prefetch; speculation-rules-tags=("awesome-platform") doesn't seem bonkers.

I believe parameter values must be "bare items", i.e., integers, decimals, strings, tokens, binary, or booleans. They cannot be inner lists.

@SulemanAhmadd
Copy link

in that it doesn't contribute to fingerprinting in itself, but is a possible additional cross-site communications channel which could be used to pass along fingerprint information gathered elsewhere?

Exactly, yes. Since the server is allowed to specify any arbitrary value in the tag, reflecting that value in cross-site requests can enable a tracking vector. Imagine an adversary controlling A.example and B.example. Assume, when the client connects to A.example, the returned speculation ruleset has a tag based on a client fingerprint. As the client is interacting with the page, the prefetch/prerender requests for B.example can be generated using pre-embedded links on the page which will contain the same fingerprint tag in the sec-purpose header for speculative requests landing on B.example. This can help the adversary confirm that the speculative requests landing on B.example is from the same visitor.

@ErikWitt
Copy link

We also support this use case from Speed Kit side.

We use speculation rules but make sure to only serve the HTML if it is in cache to ensure to never overwhelm the origin server (which is easily done). Still, sites might use more moderate prerendering via speculation rules that we would like to allow to deliver cache misses as well. For that we would like to identify the prerenders that originated from our speculation rules.

In addition to reading the header in a CDN, we also need to read it in the Service Worker which should not be an issue for same origin requests.

The tagging approach sounds valid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants