Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context-aware escaping #181

Open
quasicomputational opened this issue Jan 13, 2020 · 2 comments
Open

Context-aware escaping #181

quasicomputational opened this issue Jan 13, 2020 · 2 comments

Comments

@quasicomputational
Copy link

The HTML spec defines <style> and <script> as 'raw text elements', meaning that escapes in their bodies are not processed (hence, e.g., span:before { content: "Hello!" } cannot be escaped to span:before { content: &quot;Hello!&quot; } - try it in a browser if you don't believe me).

This is a potential footgun with maud: the path of less resistance, letting maud do the escaping, means that the scripts and styles get mangled; however, naively using PreEscaped could theoretically introduce an XSS vulnerability because then there's no check for an errant </.

This is related to #88. I'm afraid that the HTML syntax is so complicated that there's no way to avoid a certain amount of context-awareness here. I don't know what the ideal API looks like, or even if maud can do much better, but at the very least the docs should point out the footgun here.

@lambda-fairy
Copy link
Owner

Sorry about the delay.

Yeah, I've come around to accepting that this is an issue. We can make some simplifications, as e.g. the lack of legacy code allows us to simply reject cases we don't understand. But overall this will need a lot of design work.

Relevant links:

@lambda-fairy lambda-fairy changed the title Escaping in <style> and <script> is strictly wrong Context-aware escaping Apr 24, 2021
This was referenced Apr 24, 2021
@lambda-fairy
Copy link
Owner

lambda-fairy commented Apr 24, 2021

Some notes before I forget:

  • As a prerequisite, we want to validate elements / attributes and throw a hard error when either is unknown. This is because new features are added to HTML all the time, and we can't predict how escaping will work for them.
    • This would imply some configuration (?) for custom elements. Maybe look at how Clippy does config.
  • At minimum, we need: ToHtml/ToAttr, ToText (for <title> and <textarea>), ToUrl (for href and <img src>), ToTrustedUrl (for <script src>), ToStyle/ToStyleAttr, ToScript/ToScriptAttr.
    • Consider whether we need separate traits for "element body" and "attribute value" contexts, or we can get away with one trait for both.
      • Separate traits let us micro-optimize, as e.g. element bodies don't need to escape quotes.
    • Is the Url / TrustedUrl distinction necessary? Go templates don't do it.
  • Consider serde integration for safely embedding JSON.
    • It is possible to embed JSON in JavaScript safely, but I'd rather not encourage it. I'd prefer either <script type="application/json"> or the <data> element instead.
      • See https://stackoverflow.com/q/9320427/617159
      • It's a bit annoying that <data> puts its payload in an attribute, given that both Maud and JSON use double quotes. If people complain, we might need a ToSingleQuotedAttr category 😏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants