head > meta > content escaping issue #13838

oliviertassinari · 2018-10-12T12:01:36Z

Do you want to request a feature or report a bug?

I'm guessing it's a bug.

What is the current behavior?

The following source code,

<meta property="og:image" content="https://onepixel.imgix.net/60366a63-1ac8-9626-1df8-9d8d5e5e2601_1000.jpg?auto=format&q=80&mark=watermark%2Fcenter-v5.png&markalign=center%2Cmiddle&h=500&w=500&s=60ec785603e5f71fe944f76b4dacef08" />

, is being escaped once server side rendered:

<meta property="og:image" content="https://onepixel.imgix.net/60366a63-1ac8-9626-1df8-9d8d5e5e2601_1000.jpg?auto=format&amp;q=80&amp;mark=watermark%2Fcenter-v5.png&amp;markalign=center%2Cmiddle&amp;h=500&amp;w=500&amp;s=60ec785603e5f71fe944f76b4dacef08"/>

You can reproduce the behavior like this:

const React = require("react");
const ReactDOMServer = require("react-dom/server");
const http = require("http");

const doc = React.createElement("html", {
  children: [
    React.createElement("head", {
      children: React.createElement("meta", {
        property: "og:image",
        content:
          "https://onepixel.imgix.net/60366a63-1ac8-9626-1df8-9d8d5e5e2601_1000.jpg?auto=format&q=80&mark=watermark%2Fcenter-v5.png&markalign=center%2Cmiddle&h=500&w=500&s=60ec785603e5f71fe944f76b4dacef08"
      })
    }),
    React.createElement("body", { children: "og:image" })
  ]
});

//create a server object:
http
  .createServer(function(req, res) {
    res.write("<!DOCTYPE html>" + ReactDOMServer.renderToStaticMarkup(doc)); //write a response to the client
    res.end(); //end the response
  })
  .listen(8080); //the server object listens on port 8080

editor: https://codesandbox.io/s/my299jk7qp
output : https://my299jk7qp.sse.codesandbox.io/

What is the expected behavior?

I would expect the content not being escaped. It's related to vercel/next.js#2006 (comment).
I'm using the og:image meta element so my pages can have nice previews within Facebook :).

Which versions of React, and which browser / OS are affected by this issue? Did this work in previous versions of React?
16.5.2

The text was updated successfully, but these errors were encountered:

gaearon · 2018-11-01T19:34:04Z

Has this ever worked as you intend? Can you send a fix?

oliviertassinari · 2018-11-01T19:55:19Z

We are solving the problem this way:

import Entities from 'html-entities/lib/html5-entities'

const entities = new Entities()
const contentRegExp = /content="([^"]+)"/g
const handleContent = (match, content) => {
  return `content="${entities.decode(content)}"`
}

html = html.replace(contentRegExp, handleContent)

We spend ~1ms per request in the path. It's not too bad. I can give it a look at some point.

oliviertassinari · 2019-01-14T14:19:14Z

I have found this related issue: #6873. Digging into the implementation, the behavior comes from

react/packages/react-dom/src/server/DOMMarkupOperations.js

Line 61 in 0005d1e

return attributeName + '=' + quoteAttributeValueForBrowser(value);

⬇️

react/packages/react-dom/src/server/quoteAttributeValueForBrowser.js

Line 17 in b87aabd

return '"' + escapeTextForBrowser(value) + '"';

⬇️

react/packages/react-dom/src/server/escapeTextForBrowser.js

Line 108 in b87aabd

return escapeHtml(text);

Now, all the escaping tests I could find are covering the children use case:

react/packages/react-dom/src/__tests__/escapeTextForBrowser-test.js

Lines 23 to 24 in b87aabd

    
           const response = ReactDOMServer.renderToString(<span>{'&'}</span>); 
        
           expect(response).toMatch('<span data-reactroot="">&amp;</span>');

I have limited knowledge of web escaping related security issues.
I don't see any harm potential with:

 const response = ReactDOMServer.renderToString(<span data-src={'&'}></span>); 
 expect(response).toMatch('<span data-reactroot="" data-src="&"></span>');

andreubotella · 2019-03-06T22:00:08Z

I have the same problem in the content of <style> elements:

const React = require("react");
const ReactDOMServer = require("react-dom/server");

console.log(ReactDOMServer.renderToStaticMarkup(
  <html>
    <head>
      <link
        href="https://fonts.googleapis.com/css?family=Source+Sans+Pro"
        rel="stylesheet"
      />
      <style>{`
        html {
          font-family: "Source Sans Pro", sans-serif;
        }
      `}</style>
    </head>
    <body>
      <p>Test.</p>
    </body>
  </html>
));

This outputs:

<html><head><link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro" rel="stylesheet"/><style>
        html {
          font-family: &quot;Source Sans Pro&quot;, sans-serif;
        }
      </style></head><body><p>Test.</p></body></html>

By the parsing rules in the HTML spec (I'm consulting WHATWG here), the contents of elements style, xmp and iframe (as well as noscript, noframes and noembed when they're not being rendered) are parsed with the RAWTEXT tokenizer state, which treats everything as plaintext until it finds a matching closing tag.

Escaping the contents of style elements is, however, valid (in fact, mandatory for angled brackets) in the XML syntax of HTML; and indeed, adding an xmlns="http://www.w3.org/1999/xhtml" attribute to the <html> element results in valid XML. But if the intention of ReactDOMServer is indeed to render XML syntax, that should be explicitly noted in the documentation, because there are a number of tools (such as Next.js) which serve the output of these functions with content-type text/html.

oliviertassinari · 2019-03-06T22:07:05Z

@andreubotella This is a different problem, you should use dangerouslySetInnerHTML. Can an admin mark the comments as "resolved"?

stale · 2020-01-10T06:53:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution.

stale · 2020-01-17T06:59:46Z

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

jbraithwaite · 2020-08-17T21:16:44Z

This is not a bug in React. Using an entity reference for & (e.g. &) is the correct behavior for xhtml documents:

In both SGML and XML, the ampersand character ("&") declares the beginning of an entity reference (e.g., ® for the registered trademark symbol "®"). Unfortunately, many HTML user agents have silently ignored incorrect usage of the ampersand character in HTML documents - treating ampersands that do not look like entity references as literal ampersands. XML-based user agents will not tolerate this incorrect usage, and any document that uses an ampersand incorrectly will not be "valid", and consequently will not conform to this specification. In order to ensure that documents are compatible with historical HTML user agents and XML-based user agents, ampersands used in a document that are to be treated as literal characters must be expressed themselves as an entity reference (e.g. "&"). For example, when the href attribute of the a`` element refers to a CGI script that takes parameters, it must be expressed as http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user rather than as `http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user`.

In the HTML spec you do not need to use a character reference for & as long as what follows it is not a string that forms a named character reference.

The example they give is:

<a href="?bill&ted">Bill and Ted</a> <!-- &ted is ok, since it's not a named character reference -->
<a href="?art&amp;copy">Art and Copy</a> <!-- the & has to be escaped, since &copy is a named character reference -->

Personally, I feel like React made the right call with escaping & since that works in both XHTML and HTML5.

equinusocio · 2020-08-18T05:29:59Z

In meta tags escaped paths don't work... otherwise, this bug would not have be opened.

jbraithwaite · 2020-08-18T05:58:49Z

This is the change needed to get the behavior you expect:

Replace

react/packages/react-dom/src/server/ReactPartialRenderer.js

Line 383 in ee409ea

markup = createMarkupForProperty(propKey, propValue);

with an escape hatch:

if (tagVerbatim === 'meta' && propKey === 'content') {
  markup = 'content="' + propValue + '"';
} else {
  markup = createMarkupForProperty(propKey, propValue);
}

This would explicitly exempt the meta tag's content attribute from being properly escaped which wouldn't help @oliviertassinari's issue of wanting <span data-src={'&'}></span>.

A more generic solution would involve having something like dangerouslySetAttributes

<span
  dangerouslySetAttributes={{__attributes: [{name: 'data-src', value: '&'}]}}
/>

This could easily lead to parsing errors and unexpected results if any value after the & is a named character reference e.g. &copy (without the ;)

Again, the issue was with the HTML parser FB was using for the Sharing Debugger, not React. It is properly parsing the escaped paths now.

parameters in the meta file It's a known React bug, but the server-side rendering can't handle query parameters in the meta head file; it turns `&`s into `&`s, etc. To bypass that, we've added a custom share image field to the content model, and require that to be under 1MB so we don't have to handle it here. You *can* run an Express server using Next JS that will handle these HTML entities and decode them, but Vercel (the host) won't allow that, and also provides some custom optimization that other hosts wouldn't, out-of-the-box. So, decided to not go that route. Some open github issues, for reference: - vercel/next.js#2006 - facebook/react#13838 - An example Express server fix: https://gist.github.com/stefl/1f8c246dd7ca9cb332ae41f68e80088d

Macil · 2021-03-24T23:25:34Z

@equinusocio Anything reading the meta tags from the HTML should handle decoding the HTML encoded characters that React produces here. It's a bug with anything consuming the HTML if it doesn't decode the &s correctly. If you're trying to test your work and want to read HTML-decoded value, then in dev tools element inspector, pick the meta tag element, and in the console run $0.content. That will return the proper value because the browser handles HTML decoding properly as anything parsing the HTML should.

YoannBuzenet · 2021-05-04T09:56:28Z

Hello, I'm having this problem and can't find any fix. I'm on React 17.0.1. Would anyone have a hint ?

notsidney · 2021-06-11T03:30:28Z

I’ve spent the last little while trying to figure out this issue since the encoded URLs cause LinkedIn to not see Open Graph images.

I’ve come up with a workaround for our Gatsby site that uses ReactDOMServer.renderToStaticMarkup, replacing & to & in the string, then rendering using dangerouslySetInnerHTML:

const headHtml = ReactDOMServer.renderToStaticMarkup(headComponents).replace(
  /&amp;/g,
  '&'
)

return (
  <html>
    <head dangerouslySetInnerHTML={{ __html: headHtml }} />

...

And here’s the full html.js for a Gatsby project: https://github.com/notsidney/gatsby-meta-encoded-url-workaround/blob/main/src/html.js#L6-L25

gaearon · 2021-09-06T22:34:28Z

Do I understand correctly that the issue is primarily due to the content scrapers that don’t consider encoded characters? So the React output is technically correct but some scrapers are too naïve? Or is there a reason why React behavior is technically incorrect?

ihmpavel · 2021-09-06T22:50:23Z

Yes and no. For example https://resoc.io/setup will fail showing you an image if you provide encoded characters (that means this service is unusable for React users). Same behavior is for Next.JS.

Inspired by vercel/next.js#2006 (comment)
<meta property="og:image" content='/api/share?one=1&two=2' /> will become <meta property="og:image" content='/api/share?one=1&two=2' />.

In the api/share.ts, (automatically) parsed request (req.query), will become an object with keys { one: 1, 'amp;two': 2 }. It is incorrectly parsed and unusable. Libraries should handle this fine, but not all of them do. It is a bigger problem, but quite unique.

Unfortunately there is no easy solution (dangerouslySetInnerHTML, decoding escaped HTML... - it does not suit for every problem).

gaearon · 2021-09-06T23:11:04Z

Yes and no

"No" to which part?

For example https://resoc.io/setup will fail showing you an image if you provide encoded characters (that means this service is unusable for React users).

I understand that particular services may not work. What I'm asking is — technically would you agree that the problem is on their end, or is the problem unambiguously on our end? I don't mean to turn this into "not our problem, bye" kind of argument, but I think we need to get clarity on the ideal situation before we can move forward with any plan here. So is React producing technically correct output (and resoc's parser is wrong) or is React producing technically incorrect output? I understand this question may not be important to your use case (it doesn't work either way) but it is important to me.

Same behavior is for Next.JS.

I don't know what you mean by this. Next.JS is using React so it's expected that it would have the same behavior. You mention "same behavior" after resoc.io which makes it sound like you're saying Next.JS is also incorrectly parsing something, but Next.JS is producing, not consuming, and it's producing via React, so I'm not sure if there's any extra meaning I'm missing here. I'd expect Next.JS to not be relevant in this issue because the issue is about React behavior itself, and discussing React alone should be enough to figure out the path forward here. But maybe I missed something!

Inspired by vercel/next.js#2006 (comment) <meta property="og:image" content='/api/share?one=1&two=2' /> will become <meta property="og:image" content='/api/share?one=1&two=2' />.

OK so maybe this is what you mean by "for Next.JS". I think you're saying: "when the URL is encoded, the request received by my API built with Next.JS contains incorrect parameter". However, this assertion is missing an important detail: what exactly is sending a request to your Next.JS API? Is this Facebook? Twitter? LinkedIn? resoc.io? This is important — I need to understand which case exactly you are trying to solve, if there is a concrete one.

In the api/share.ts, (automatically) parsed request (req.query), will become an object with keys { one: 1, 'amp;two': 2 }. It is incorrectly parsed and unusable. Libraries should handle this fine, but not all of them do.

It is unclear which libraries you're referring to here. Next.JS? I believe Next.JS's behavior is correct here — if the request itself has an encoded string, Next.JS isn't supposed to unencode it. I believe unencoding should've been done by whatever thing that parsed your meta tag. This is why I'm asking what is the exact service you're trying to make this work with. And whether the same problem exists for other services now. E.g. if FB and Twitter treat this correctly, I think this could be a strong enough argument that you should report the problem to the faulty service instead, and have them fix it.

depoulo · 2021-09-07T14:01:17Z

In my view it's just naïve scrapers, some if which are run by Facebook, see #6873 (comment)

devanshj · 2021-11-03T01:30:23Z

fwiw afaict this makes you unable to write inline scripts in NextJS.

phawk · 2022-02-23T10:16:42Z

Whilst React's approach might be technically correct, it doesn't work in the real world, because we have naive scrapers, from massive organisations like LinkedIn, Slack and WhatsApp. So you can choose to be technically correct and pedantic or to have your code work, in the real world, right now. I for one choose the latter.

I'm trying to get open graph images with query params to work, now I need to look for a hacky workaround because I can't go and force these massive tech organisations to "fix" their code.

Macil · 2022-02-23T23:49:44Z

I'm wondering if most of the people who believe this is an issue for themselves are mistaken. I first found this issue because I was trying to figure out why Twitter wasn't showing an image from one of my <meta> tags. I viewed the HTML my server was rendering with React, copied the URL from the content="..." part of my meta tag, and found that it didn't work in my browser, so I assumed the reason that didn't work was the same reason Twitter wasn't loading my image. However, the reason that copied URL didn't work in my browser was just because I copied the raw HTML instead of the HTML-decoded string, and it was not the reason that Twitter wasn't loading my image. The actual issue was that I wasn't using the proper set of <meta> tags that Twitter respected. (There were no entries in my request logs from Twitter. If the issue had been that they weren't properly doing HTML decoding, then I would have seen entries in the request logs for URLs containing & where there should have been &.)

I also wonder if some people are improperly HTML-encoding some strings before giving them to React and then thinking the double-encoding is caused by React doing HTML-encoding. People of both of these situations are present in the related thread #6873, so it seems likely that at least a decent portion of people in this thread have the same confusions.

Before suggesting putting workarounds in React for other services' parsers, we should try to be sure the problem is not one of these other dev-side confusions, we should figure out a list of services we're sure have the issue, we should try to report the issue to them first, and only then move onto figuring out workarounds if the list of services with problems is still large.

Several people have suggested that LinkedIn is unable to handle properly HTML-encoded meta tags, but from my tests they seem to handle things fine. I created this test HTML page:

<!DOCTYPE html>
<html>
  <head>
    <title>test page</title>
    <meta property="og:title" content="Title of the article" />
    <meta property="og:image" content="/1234567.png?a=b&amp;c=e" />
    <meta
      property="og:description"
      content="Description that will show in the preview"
    />
  </head>
  <body>
    this is a test page
  </body>
</html>

The HTML for the meta og:image tag has "/1234567.png?a=b&c=e", which should decode to "/1234567.png?a=b&c=e".

I made a directory, put that file as "index.html" in it, put a random image named "1234567.png" in the directory too, I ran npx http-server . to host that directory through a local http server, and then I ran ngrok http 8080 to get a public URL that leads to that HTTP server. I then put the URL into https://www.linkedin.com/post-inspector/.

My local HTTP server's logs immediately showed the following, confirming that it correctly decoded:

[2022-02-23T23:33:31.661Z]  "GET /" "LinkedInBot/1.0 (compatible; Mozilla/5.0; Apache-HttpClient +http://www.linkedin.com)"
[2022-02-23T23:33:32.181Z]  "GET /1234567.png?a=b&c=e" "LinkedInBot/1.0 (compatible; Mozilla/5.0; Apache-HttpClient +http://www.linkedin.com)"

Additionally, the LinkedIn Post Inspector page correctly showed the detected image URL as https://a126-omitted-3fb7.ngrok.io/1234567.png?a=b&c=e.

It seems like LinkedIn is handling things correctly and working with the standard HTML-encoding behavior React is following, and that no change or workaround in React is necessary to work here with them. If there are other parts of LinkedIn that do have an issue, then the issue could be demonstrated with a test like this.

apecollector · 2022-05-24T08:38:36Z

Same issue, using og:image where the url uses query parameters, twitter sees these and sends them to my server which doesn't recognize &query=value instead of &query=value.

talentedandrew · 2022-06-16T09:35:17Z

@apecollector I had the similar experience with renderToStaticMarkup method, which escapes html tags, new line etc. I used unescape from lodash to fix it.

_.unescape('&amp;query=value');

outputs

&query=value

sc0ttdav3y · 2022-09-09T07:14:00Z

Hi all, I know this is a 4 year old issue and probably most of you have probably moved on, but I'm new to it and feel I must be missing something obvious.

I suspect we're all adding content security policies to our React apps by now, and those policies have a content value often with single quotes in them. I'm running Next.JS with static site generation, and their advice is to add it into /pages/_document.tsx. But this happens:

<meta httpEquiv="Content-Security-Policy" content="default-src 'self'" />

becomes

<meta httpEquiv="Content-Security-Policy" content="default-src &amp;self&amp;" />

What's the recommended way to do this in 2022?

devilportez · 2022-09-14T06:33:24Z

Hi, I have the same issue as @sc0ttdav3y has, but in my case single quotes are escaping with ' (Next.JS 12.3.0, ssg).

Macil · 2022-09-14T07:33:40Z

(I just tried the same thing with NextJS and the exact output I got is <meta http-equiv="Content-Security-Policy" content="default-src 'self'"/>, consistent with #13838 (comment), so I'm going to assume the different value in #13838 (comment) was a copy-paste error of some kind.)

@sc0ttdav3y @devilportez It should be fine that the value is html-encoded like that. The browser will decode html entities when reading the html and interpret the value of the meta tag the same way as "default-src 'self'". You should see the CSP policy be in effect: if you make the page use something denied by the pollicy (like inline styles and scripts, which NextJS uses by default) then an error will appear in the console.

BrodaNoel · 2023-03-14T23:55:59Z

Still happening in [email protected] and [email protected].

Kind of killing my SEO because I can't use:

<meta property="og:image" content="url-with-&" />

aaronbarker · 2023-06-02T14:37:31Z

I ran into this escaping issue and tried a hack from another issue I had also ran into. It seemed to have fixed the escaping for me. You get a few unnecessary <style></style> elements injected, but may be a better option for some than parsing the final content. YMMV

<style
  dangerouslySetInnerHTML={{
    __html: `</style>
      ${yourCode}
    <style>`,
  }}
></style>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

head > meta > content escaping issue #13838

head > meta > content escaping issue #13838

oliviertassinari commented Oct 12, 2018 •

edited

Loading

gaearon commented Nov 1, 2018

oliviertassinari commented Nov 1, 2018

oliviertassinari commented Jan 14, 2019 •

edited

Loading

andreubotella commented Mar 6, 2019 •

edited

Loading

oliviertassinari commented Mar 6, 2019 •

edited

Loading

stale bot commented Jan 10, 2020

stale bot commented Jan 17, 2020

jbraithwaite commented Aug 17, 2020

equinusocio commented Aug 18, 2020 •

edited

Loading

jbraithwaite commented Aug 18, 2020 •

edited

Loading

Macil commented Mar 24, 2021 •

edited

Loading

YoannBuzenet commented May 4, 2021

notsidney commented Jun 11, 2021

gaearon commented Sep 6, 2021

ihmpavel commented Sep 6, 2021

gaearon commented Sep 6, 2021

depoulo commented Sep 7, 2021

devanshj commented Nov 3, 2021 •

edited

Loading

phawk commented Feb 23, 2022

Macil commented Feb 23, 2022 •

edited

Loading

apecollector commented May 24, 2022

talentedandrew commented Jun 16, 2022

sc0ttdav3y commented Sep 9, 2022 •

edited

Loading

devilportez commented Sep 14, 2022

Macil commented Sep 14, 2022

BrodaNoel commented Mar 14, 2023 •

edited

Loading

aaronbarker commented Jun 2, 2023

jimrandomh commented Nov 27, 2023

erickreutz commented May 3, 2024

GabenGar commented Oct 1, 2024

head > meta > content escaping issue #13838

head > meta > content escaping issue #13838

Comments

oliviertassinari commented Oct 12, 2018 • edited Loading

gaearon commented Nov 1, 2018

oliviertassinari commented Nov 1, 2018

oliviertassinari commented Jan 14, 2019 • edited Loading

andreubotella commented Mar 6, 2019 • edited Loading

oliviertassinari commented Mar 6, 2019 • edited Loading

stale bot commented Jan 10, 2020

stale bot commented Jan 17, 2020

jbraithwaite commented Aug 17, 2020

equinusocio commented Aug 18, 2020 • edited Loading

jbraithwaite commented Aug 18, 2020 • edited Loading

Macil commented Mar 24, 2021 • edited Loading

YoannBuzenet commented May 4, 2021

notsidney commented Jun 11, 2021

gaearon commented Sep 6, 2021

ihmpavel commented Sep 6, 2021

gaearon commented Sep 6, 2021

depoulo commented Sep 7, 2021

devanshj commented Nov 3, 2021 • edited Loading

phawk commented Feb 23, 2022

Macil commented Feb 23, 2022 • edited Loading

apecollector commented May 24, 2022

talentedandrew commented Jun 16, 2022

sc0ttdav3y commented Sep 9, 2022 • edited Loading

devilportez commented Sep 14, 2022

Macil commented Sep 14, 2022

BrodaNoel commented Mar 14, 2023 • edited Loading

aaronbarker commented Jun 2, 2023

jimrandomh commented Nov 27, 2023

erickreutz commented May 3, 2024

GabenGar commented Oct 1, 2024

oliviertassinari commented Oct 12, 2018 •

edited

Loading

oliviertassinari commented Jan 14, 2019 •

edited

Loading

andreubotella commented Mar 6, 2019 •

edited

Loading

oliviertassinari commented Mar 6, 2019 •

edited

Loading

equinusocio commented Aug 18, 2020 •

edited

Loading

jbraithwaite commented Aug 18, 2020 •

edited

Loading

Macil commented Mar 24, 2021 •

edited

Loading

devanshj commented Nov 3, 2021 •

edited

Loading

Macil commented Feb 23, 2022 •

edited

Loading

sc0ttdav3y commented Sep 9, 2022 •

edited

Loading

BrodaNoel commented Mar 14, 2023 •

edited

Loading