Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect HTML/JS injection attacks and warn users pro-actively #2886

Closed
joshgoebel opened this issue Nov 22, 2020 · 54 comments · Fixed by #3057
Closed

Detect HTML/JS injection attacks and warn users pro-actively #2886

joshgoebel opened this issue Nov 22, 2020 · 54 comments · Fixed by #3057
Assignees
Labels
big picture Policy or high level discussion discuss/propose Proposal for a new feature/direction enhancement An enhancement or new feature parser
Milestone

Comments

@joshgoebel
Copy link
Member

joshgoebel commented Nov 22, 2020

Is your request related to a specific problem you're having?

#2884
ccampbell/rainbow#249

This comes up again and again (though thankfully not TOO often). Beginners are VERY confused about this whole HTML escaping thing.

The solution you'd prefer / feature you'd like to see added...

While we can't do anything smart about this yet (because we unfortunately allow HTML inside code blocks for "clever" users) I'd like this to change with v11. With v11 we should drop this HTML pass-thru behavior and move it to a plugin (making it very much opt-in). The default behavior should be that HTML is silently dropped and I'd even consider adding some sort of error:

[code block]
WARNING.
Are you missing a bunch of HTML code you expected to see here? 
Your HTML wasn't properly escaped and that can lead to serious
security issues. _Learn More_

Properly escape your code and the highlighting you expect will kick in.
[/code block]

This would of course be a breaking change so we'd need to wait until v11. For 95% of users I can't see the downside to this and it seems we could potentially educate and prevent a lot of harm. Someone wanting the HTML to pass thru would install a plug-in and thus change the behavior, get the old behavior back, etc.

Any alternative solutions you considered...

Silent dropping but no error... but that just leads to support issues... I suppose we could log the error to the console vs actually showing it on the webpage.

@joshgoebel joshgoebel added the enhancement An enhancement or new feature label Nov 22, 2020
@joshgoebel joshgoebel changed the title [Request] ... Proposal: Detect HTML injection attacks and warn users pro-actively Nov 22, 2020
@joshgoebel joshgoebel added big picture Policy or high level discussion parser discuss/propose Proposal for a new feature/direction labels Nov 22, 2020
@joshgoebel joshgoebel added this to the 11.0 milestone Nov 22, 2020
@joshgoebel
Copy link
Member Author

@RunDevelopment Would love if you have any thought on this from a Prism.js perspective.

@joshgoebel joshgoebel changed the title Proposal: Detect HTML injection attacks and warn users pro-actively Detect HTML/JS injection attacks and warn users pro-actively Nov 22, 2020
@RunDevelopment
Copy link

Adding a console waring is a good idea IMO.
Just make sure you only log once, otherwise constantly re-rendering React projects will get their console spammed. There might also be a problem with SSR in general, so there should be an option to turn off the warning.


Honestly, I don't understand this merging but that's probably the HTML pass-thru you mentioned. (Is this behavior documented anywhere?)

For Prism, we have Keep Markup to implement an HTML pass-thru but this can actually cause some issues (PrismJS/prism#2384).
HTML pass-thru seems to on by default in HighlightJS. Did this cause problems with double (SSR+client-side) highlighting in the past?

@joshgoebel
Copy link
Member Author

joshgoebel commented Nov 22, 2020

Just make sure you only log once, otherwise constantly re-rendering React projects will get their console spammed.

This only matters on the client-side when retrieving content from an HTML element... it doesn't have any relevance to SSR at all unless I'm missing something. The only case we're trying to prevent here is client-side:

Here is an example of how script works:

<pre><code>
<script>alert("hi");</script>
</code></pre>

...

<script>
hljs.highlightOnLoad(); // will fire us up and highlight every code block
</script>

Now that code block is VERY dangerous because that code will execute. It should be escaped... and if this was a more complex example then all sorts of harm could result... so I'm saying with a major version release a breaking change that simply DISALLOWS HTML inside the code block... it must be raw text or an error... the idea being that in development people would catch this problem (because of the visible error) and it would never make it out into the wild (where someone is going to exploit it).

so there should be an option to turn off the warning.

We could add allowUnsafeHTML or some such - but I'm not sure why it's needed. But I'm honestly thinking of entirely breaking the highlighting functionality if we see HTML and they haven't enabled that (or aren't using a plugin that allows for it).

@joshgoebel
Copy link
Member Author

joshgoebel commented Nov 22, 2020

Honestly, I don't understand this merging but that's probably the HTML pass-thru you mentioned. (Is this behavior documented anywhere?)

Not really, but the idea is simple. We pass HTML thru without touching it in the same position in the content:

var x;
<span class="important">var y;</span>

We will highlighting this as such (on the browser):

<span class="keyword">var</span> x;
<span class="important"><span class="keyword">var</span> y;</span>

We added keyword wrappers but the HTML passes thru unmolested... making it useful for things like highlighting a line or a piece of code. It's definitely an edge case but one some people really want - but it's a ton of complexity.


Occasionally we get an issue here "where did all my HTML go" and this issue was meant to perhaps stave off such issues by building that "education" into the library iself.

@joshgoebel
Copy link
Member Author

The code someone would use to do Highlight.js SSR is very different than the code they'd use on the client. Most people probably do one or the other not both.

@RunDevelopment
Copy link

My concern is that the server might pass already highlighted code to the client, the client rehydrates the pages, and that causes re-highlighting. But maybe that's not a problem?

@joshgoebel
Copy link
Member Author

joshgoebel commented Nov 22, 2020

My concern is that the server might pass already highlighted code to the client, the client rehydrates the pages, and that causes re-highlighting. But maybe that's not a problem?

That would be user error. And that already doesn't work. IF someone was using it on both sides then you'd just add hljs class to the SSR HTML and the client-side would know it had already been rendered and leave it alone. I think it's far more common that people do 100% rendering either server-side (then you don't need JS on the client) or client-side (because it's so fast that there is no point in wasting server time on it).

@5ko
Copy link

5ko commented May 30, 2021

Hi. I'm not sure if I should post a message to a closed issue or open a new issue, but a link to here came into my JS console.

I suspect the warning is incorrectly triggered in some cases, where the HTML is correctly escaped. See for example: documenting PHP code that includes HTML strings, all correctly escaped and without a danger to the person reading the page: https://www.pmwiki.org/wiki/PmWiki/WikiStyles#highlight

We are using a custom function that finds the elements that need to be highlighted, sets their classes and then calls highlightBlock -- and now highlightElement -- to highlight them.

Thank you for looking into it -- please let me know if I can give any relevant information.

@joshgoebel
Copy link
Member Author

joshgoebel commented May 30, 2021

Your very first snippet has embedded unescaped HTML - two hyperlinks. We have no way to know if that might be malicious code or not - code blocks are expected to contain only text, not HTML. If you know for sure you can disable the warning (see documentation), but those hyperlinks will still be removed - we do not process HTML by default any longer.

If you need the HTML to be preserved you should see the issue on this subject and may need to use a plugin now: #2889

Screen Shot 2021-05-30 at 6 17 43 AM

@5ko
Copy link

5ko commented May 30, 2021

Aha, thank you very much!! Indeed they are automated links to our internal variable documentation. I'll review it and decide what to do. Many thanks!

@ramennbowls
Copy link

@joshgoebel I wanted to use <textarea> to easily display HTML Code without having to convert the arrows in to symbol code (< to <) but when ever I do use <textarea> it throws me this security warning but there is no unescaped html?

This is the simple code thats triggering it, am I doing something wrong or is this a false alarm?
<pre><code class="html"><textarea><h1>Header 1</h1></textarea></code></pre>

@joshgoebel
Copy link
Member Author

joshgoebel commented Jun 1, 2021

Please make a JS fiddle or a concrete real example I can view in the web browser and I'll take a look. 

@ramennbowls
Copy link

I created a simple example on JSFiddle and it throws the same error "One of your code blocks includes unescaped HTML. This is a potentially serious security risk."

https://jsfiddle.net/sz39oedx/

@joshgoebel
Copy link
Member Author

joshgoebel commented Jun 1, 2021

<pre><code class="html"><textarea><h1>Header 1</h1></textarea></code></pre>

In this example <textarea> itself is an unescaped HTML tag. All HTML inside code must be properly escaped to avoid the warning. You really should simply probably escape your HTML instead of trying to find ways around it.

@joshgoebel
Copy link
Member Author

joshgoebel commented Nov 18, 2021

All you did (AFAICT) is avoid calling our code that would show the warning - so now you could still be injecting HTML, but you won't get any warning. If you're using highlightAll() and getting a warning the solution is to fix your HTML, not use your own custom JS to avoid the warning.

See my example and try something similar with your own code... do you see the alert? If two then you're vulnerable.

@unfor19
Copy link

unfor19 commented Nov 18, 2021

@joshgoebel I'm very intrigued, here's how my HTML looks like after the rendering

Rendered raw HTML

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code class="hljs language-bash">$ <span class="hljs-built_in">cat</span> /etc/passwd | grep <span class="hljs-string">"<span class="hljs-subst">$(whoami)</span>"</span>
myuser:x:1000:1000:,,,:/home/myuser:/bin/bash
</code></pre></div></div>

This is how it appears on the page

image

Does this mean I didn't really fix the issue?

Btw, I changed the code from hljs.highlightElement(el); to hljs.highlightAll(); and it still works with no warnings

<script defer>
  document.addEventListener("DOMContentLoaded", (event) => {
    document.querySelectorAll("pre code").forEach((el) => {
      el.innerHTML = el.textContent;
    });
    hljs.highlightAll();
  });
</script>

@joshgoebel
Copy link
Member Author

joshgoebel commented Nov 18, 2021

after the rendering

I find this problematic in that it's far too vague... what matters is the RAW HTML coming from the web server... if "after render" includes after JS has been run then it's almost meaningless to tell you anything useful since the entire page could potentially now be different (modified by JS).

There is no HTML in your example (escaped or not) so it's hard to know... You need to add some HTML to your "pre-render" files:

<pre><code>
<h1>I'm big BAD HTML</h1>
<script>alert("your hacked");</script>
</code></pre>

If it makes it thru (to the browser), then you have a potential problem. Highlight.js isn't necessary for do any of this testing.

https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/11-Client-side_Testing/03-Testing_for_HTML_Injection

@unfor19
Copy link

unfor19 commented Nov 18, 2021

@joshgoebel This was very helpful! Thank you, you're right :)

image

@nilslindemann
Copy link

@joshgoebel

I understand that the problem is, that this data we hand over to the highlightElement function, which you mentioned above, is supposed to be an element with just text content.

I wonder, is this Stack Overflow answer a solution for this problem? For example ...

Example exploit (not harmful)

... When I replace on that site, in the developer tools (local overrides enabled), in the index.js, the line ...

$(".result-html").html(mdHtml.render(source));

... with

$(".result-html").html(mdHtml.render(new DOMParser().parseFromString(source, "text/html").documentElement.textContent));

... according to that SO answer, and reload the page, then the script is not executed anymore.

What is your opinion on this? Does this solve the problem? And if so, couldn't you devs just do this in the location where you log the warning message?

Greetings, Nils

@joshgoebel
Copy link
Member Author

joshgoebel commented Nov 29, 2021

couldn't you devs just do this in the location where you log the warning message?

How? The page is rendered long before we have control. At that point it's too late. the HTML has already been rendered, any injected code has already done any damage it intended to do. Again, see the sample above. If you can show me some way to fix it that with PURE JS I'll be very impressed indeed.

is this Stack Overflow answer a solution for this problem?

I tend to dislike 'cut and paste' solutions where implementors [those doing the cutting and pasting] don't have a full understanding of the actual problem. Please see: https://stackoverflow.com/questions/64772302/is-parsing-html-with-domparser-safe-from-xss with some thoughts on DOMParser in this context. It seems perhaps adequate if one uses it properly and entirely inadequate if not... and I don't think it does anything (in an of itself) to make it clear which is which... unlike say a HTML.sanitize() function... which has a SINGULAR purpose, etc...

If one doesn't understand these problems fully it's probably best they read until they learn, or use a platform/library that already solves these problems for them.

@nilslindemann
Copy link

nilslindemann commented Dec 2, 2021

couldn't you devs just do this in the location where you log the warning message?

I just see, your library already reads the node's children as textContent. So my above comment was unnecessary. Your lib does what it can do.

I am not sure if the additional warning message helps. It made me think your library does not read the children as textContent, for whatever reason, so I did domNode.textContent = domNode.textContent before I did highlightHTML(domNode), which obviously is unnecessary work. unfors comment also indicates this misunderstanding.

Yes, you are absolutely right, the server should properly filter and escape. And testing is always good.

Regarding the SO question linked by you, the idea is of course to parse the Html string with the DOMParser (which does not execute script tags while parsing), and then get the desired part as .textContent, not as .firstChild (Or, if the elements are needed, to DOMPurify.sanitize() the Html string, and set someElem.innerHTML to it, and optionally do hljs.configure({ignoreUnescapedHTML: true});).

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 2, 2021

I am not sure if the additional warning message helps. It made me think your library does not read the children as textContent,

I'm not sure why, nothing it says implies that as far as I can see.

"One of your code blocks includes unescaped HTML. This is a potentially serious security risk."

It purposely doesn't say "don't worry we use textContent" because that is irrelevant - as already pointed out - and could imply a false sense of security. We only use textContent (and show a warning) to purposely BREAK the HTML content to make them aware of the issue. This does not solve the issue, but hopefully shines some light on it in development rather than production.

If you have better verbiage feel free to suggest.

In version 12 this will be a hard error and we will not render the content at all (unless someone sets a special flag). This feature can also be enabled in version 11 today to allow errors to be caught in development easier.

the idea is of course to parse the Html string with the DOMParser

I understand the idea myself.

@nilslindemann
Copy link

nilslindemann commented Dec 3, 2021

If you have better verbiage feel free to suggest.

Ok, I would suggest replacing

console.warn("One of your code blocks includes unescaped HTML. This is a potentially serious security risk.");
console.warn("https://github.com/highlightjs/highlight.js/issues/2886");
console.warn(element);

with

console.warn("One of your code blocks contains unescaped HTML elements.");
console.warn("This allows attackers to execute malicious Javascript code (XSS Attack).");
console.warn("The library will remove all inner HTML elements by reading them as textContent.");
console.warn("This does NOT protect you from these XSS attacks, as this HTML (including Javascript and CSS)");
console.warn("gets executed by your browser when it loads the page.");
console.warn("In version 12, this library will refuse to highlight such elements, unless you set a special flag.");
console.warn("Currently you can disable this warning by doing `hljs.configure({ignoreUnescapedHTML: true});`");
console.warn("See also https://github.com/highlightjs/highlight.js/issues/2886");
console.warn("The element in question is:");
console.warn(element);

I would also suggest renaming the function from highlightElement to highlightTextContents.

I understand the idea myself.

That's nice, but you are not the only one reading this thread. A lot of people will, as you point them here. In that context, I may once again point to Preventing XSS Attacks, which is a readable introduction into the topic.

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 3, 2021

I think it was implied I meant "in the same amount of words, give or take". :-) Console warnings isn't the place to write a book IMHO... I think I may just replace the link to this issue with a new link to (just created):

https://github.com/highlightjs/highlight.js/wiki/security

I don't feel it's our job to explain the details of XSS/HTML injection in detail (though perhaps I'm wrong on this), but we can certainly link to those resources. This is something that IMHE requires research and understanding or someone is going to get it wrong. We can be the beginning of someone's learning journey, but probably not the end.

@nilslindemann
Copy link

nilslindemann commented Dec 3, 2021

@joshgoebel Man, thank you, for investing your time in this. This is a good wiki entry. It explains short and clearly what the issue is, including a code example. It states your position on this. It links to helpful locations. One could not wish more. I also like that you included the link to the Acunetix article, which is written in a simple language, and will help beginners to grasp the basic principles of how to defend against XSS. People will have their questions answered with this wiki entry.

@y377
Copy link

y377 commented Dec 12, 2021

@fractalhq make sure you know what you're doing (regarding sanitization), and then use this config option.

hljs.configure({ ignoreUnescapedHTML: true })

You are a real big boss (in Chinese, big boss means very powerful);
I am still confused about how jekyll's markdown syntax fits perfectly with highlight. Originally, the console throws a lot of warnings. I also plan to enable Markdown writing, but when I saw your answer, the warning was solved perfectly.

Before repair:

image
image

After repair:

image

update repairjekyll

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 12, 2021

@y377 @voraciousdev

make sure you know what you're doing (regarding sanitization)

sigh This makes me wonder if further changes aren't needed here...

It's one thing to prefix advice with "make sure you know what you're doing [first]" but I highly doubt most people are actually doing that step since just disabling the warning does not fix the problem or the broken HTML. The HTML shown above (posted by @y377) with naked < characters is not correct. Those characters SHOULD be escaped as &lt;. Someone who turns turns off the warning and then leaves the broken HTML does not understand what they are doing IMHO. All someone would have to do is slip a single Markdown file into your build process and then your entire website/blog is compromised.

The correct solution here would be for Jekyll to sanitize/encode HTML entities properly inside code blocks (or have a mode/setting to do this).

@joshgoebel
Copy link
Member Author

@y377 Can you expand the console log or include the raw HTML generated so we can see what HTML tags are actually getting generated? The error is only generated when their are children tags inside a code block, which Highlight.js does not support by default. So I'm curious what the children tags are in your case.

The unescaped < is problematic, but not a tag in and of itself.

@joshgoebel
Copy link
Member Author

So far Jeykll seems to do the "right thing" of the box. See jekyll/jekyll#8903 Those who are seeing this with Jeykll I'd love to have more information to actually track down the problem... here is looking at you @y377 , etc...

@y377
Copy link

y377 commented Dec 13, 2021

So far Jeykll seems to do the "right thing" of the box. See jekyll/jekyll#8903 Those who are seeing this with Jeykll I'd love to have more information to actually track down the problem... here is looking at you @y377 , etc...

image

@joshgoebel
Copy link
Member Author

So as concluded on the other thread, don't use TWO different syntax highlighters. Trying to run Highlight.js on top of rogue is a no-go as there are already a bunch of highlighting tags there and that is what Highlight.js is complaining about.

@omundy
Copy link

omundy commented Dec 27, 2021

@joshgoebel I can't upvote your post, but I can thank you here. This was my issue. Specifically I'm using highlightjs via grunt-md2html and the docs weren't clear that if I linked to the script AND set options in the config that it would run highlight twice.

@Victor-Salomon
Copy link

Victor-Salomon commented Jun 22, 2022

hello @joshgoebel and all,
I trying to find a way to use the following snippet with Highlight js on a NextJs project.

const snippet = import { createNft } from "nft"; import { generateSeed, getKeyringFromSeed } from "account" const createMyFirstNFT = async () => { try { const account = await generateSeed() const keyring = await getKeyringFromSeed(account.seed) const address = keyring.address await createNft(address, "My first NFT", 10, null, false, keyring) } catch(e) { console.log(e) } }

pre
code className={${styles.snippet} javascript}
{snippet}
code
pre
Capture d’écran 2022-06-22 à 17 24 03

it works but I still have some issues with the unescaped HTML.
What's the best way to pass a long js function to the pre{snippet}pre tags?
Many thanks in advance,

@joshgoebel
Copy link
Member Author

You need to properly escape snippet before/as you inline it in the JSX... or perhaps just add it yourself dynamically at runtime using the DOM's textContent property.

You may need to do it on your ClipboardCopy line also, I'm not sure if your framework handles that auto-magically for you or not (it may)...

@Victor-Salomon
Copy link

Victor-Salomon commented Jun 22, 2022

@joshgoebel thanks so much for your help.
Does that means I have to escape like this ?

Capture d’écran 2022-06-22 à 18 33 21

If I remove the function in the snippet, and replace it by "hello word" it works properly without error :
175086519-0b1d752f-a6d5-4830-bffc-7b7ac3b7ed3e

but if I add the escaped snippet, it still display the error and and does not handle the espaced code :
175087263-bc50e8e0-f50b-4c89-bcdf-d023337f4f1c

@joshgoebel
Copy link
Member Author

joshgoebel commented Jun 22, 2022

Use the web inspector to expand the <code> element and it'll show you whatever the problem is... ANY other elements under <code> are a problem... just expand it and loko.

@Victor-Salomon
Copy link

Thanks again for your feedback @joshgoebel. I did expand it to see were can be the unescaped text. But the whole function appear as you can see. Every {} brakets and () looks to be generating the issue. Should I write the snippet in a different way ?
Capture d’écran 2022-06-23 à 09 35 23

@joshgoebel
Copy link
Member Author

joshgoebel commented Jun 23, 2022

I did expand it to see were can be the unescaped text. But the whole function appear as you can see.

Oh you'd need to temporarily hack your HLJS to STOP highlighting so you can see the actual issue... sometimes I forget that JS objects live-update in the log as they continue to change...

https://highlightjs.readthedocs.io/en/latest/api.html#configure

Turn on throwUnescapedHTML and try checking the log.

@inpresif
Copy link

inpresif commented Mar 20, 2023

Just want to add that in TWIG you can just use this before/after your code

        {% autoescape 'html' %}
            your code
        {% endautoescape %}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
big picture Policy or high level discussion discuss/propose Proposal for a new feature/direction enhancement An enhancement or new feature parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.