Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable parsing of raw HTML #225

Open
will-hart opened this issue Oct 26, 2018 · 10 comments
Open

Disable parsing of raw HTML #225

will-hart opened this issue Oct 26, 2018 · 10 comments

Comments

@will-hart
Copy link

Is there a way to disable parsing of raw HTML altogether? I know I can override specific tags but I'd like to automatically escape HTML characters without transforming the data stored in my database.

@quantizor
Copy link
Owner

Not currently, but it's something that could probably be added.

@fastfedora
Copy link

It would be nice to combine escaping HTML elements with a whitelist of HTML elements that are allowed to be parsed; anything else would be escaped. This would provide additional safety when displaying user-generated input.

@fastfedora
Copy link

Another feature might be a sanitization function that can be run before the attributes are passed to the parsed element. While you can always provide custom components for everything, one function could handle both things like URL sanitization for non-javascript: URLs and content filtering (curse words, etc).

Something along the lines of sanitize(node, rule, element) that would return an updated node. So an option might be:

sanitize: (node, rule, element) => (element == 'a' ? { ...node, target: customSanitizeUrl(node.target) } : node)

Or:

sanitize: (node, rule, element) => ({ ...node, content: bleepCurseWords(node.content) })

Then in the code, where it

   footnoteReference: {
      match: inlineRegex(FOOTNOTE_REFERENCE_R),
      order: PARSE_PRIORITY_HIGH,
      parse(capture /*, parse*/) {
        return {
          content: capture[1],
          target: `#${capture[1]}`,
        };
      },
      react(node, output, state) {
        const sanitizedNode = sanitize(node, 'footnoteReference', 'a');

        return (
          <a key={state.key} href={sanitizeUrl(sanitizedNode.target)}>
            <sup key={state.key}>{sanitizedNode.content}</sup>
          </a>
        );
      },
    },

This isn't the right issue for this, but it's a broader issue of how to handle user-generated content. I like this library, but I'm switching to react-markdown because it has better support for displaying user-generated content. markdown-to-jsx looks like a great library for internal content. To make it safely support user-generated content, I think you need:

  • Disable automatic HTML parsing
  • Whitelist of HTML to allow parsing for (optional)
  • Whitelist of allowable URLs and/or sanitization function to prevent bad links (not just JS links)

Hopefully this comment has been helpful in that.

@rescribet
Copy link

Possibly relevant package https://github.com/cure53/DOMPurify

@quantizor
Copy link
Owner

That lib is bigger than markdown-to-jsx itself unfortunately.

Adding some basic config to just disable the HTML parsing rules should be relatively straightforward and it would just end up in the generated markdown as plain text.

@rahulgi
Copy link

rahulgi commented Oct 8, 2020

Should this issue should be closed now after #278?

@stephan-noel
Copy link
Contributor

First, please excuse my lack of security knowledge 🙂 . I have a problem that optionally disabling parsing raw HTML right now will also disable my custom components.

const options = {overrides: {MyCustomComponent: MyCustomComponent}};

<MyCustomComponent/> // This no longer works if I disable parsing raw HTML.

But what if I want to disable parsing raw HTML only (ie, like <script> tags) but allow custom components to still work?

As stated here #307 (comment) I'd have to allow raw HTML but use something like dompurify, which is a bit heavy (larger than this library itself).

My question is, can we allow disabling raw HTML, but allow custom MDX components or somehow make it an option, maybe on disableParsingRawHTML itself?

@AJamesPhillips
Copy link

@stephan-noel you can use a custom override just for script tags, for example:

const value = `Hello<div style="color: red;">World</div><script src="evil.com">Bad script</script>`

const MARKDOWN_OPTIONS = {
    overrides:
    {
        // If there is any text inside the script tag then render this, otherwise render nothing.
        script: (props: { children: string }) => props.children,
    },
}

<Markdown options={MARKDOWN_OPTIONS}>
    {value}
</Markdown>

Would just render as:
image
But the script tag would be missing from the resulting html.

@quantizor
Copy link
Owner

Hmm we should be discarding script tags entirely as they're obviously a malicious vector

@AJamesPhillips
Copy link

Ah that's interesting. Yes I'm running 7.1.6 and without disableParsingRawHTML: true then the script's src attribute is rendered and is requested by the page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants