-
Notifications
You must be signed in to change notification settings - Fork 46.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
head > meta > content escaping issue #13838
Comments
Has this ever worked as you intend? Can you send a fix? |
We are solving the problem this way: import Entities from 'html-entities/lib/html5-entities'
const entities = new Entities()
const contentRegExp = /content="([^"]+)"/g
const handleContent = (match, content) => {
return `content="${entities.decode(content)}"`
}
html = html.replace(contentRegExp, handleContent) We spend ~1ms per request in the path. It's not too bad. I can give it a look at some point. |
I have found this related issue: #6873. Digging into the implementation, the behavior comes from
⬇️
⬇️
Now, all the escaping tests I could find are covering the children use case: react/packages/react-dom/src/__tests__/escapeTextForBrowser-test.js Lines 23 to 24 in b87aabd
I have limited knowledge of web escaping related security issues. I don't see any harm potential with: const response = ReactDOMServer.renderToString(<span data-src={'&'}></span>);
expect(response).toMatch('<span data-reactroot="" data-src="&"></span>'); |
I have the same problem in the content of const React = require("react");
const ReactDOMServer = require("react-dom/server");
console.log(ReactDOMServer.renderToStaticMarkup(
<html>
<head>
<link
href="https://fonts.googleapis.com/css?family=Source+Sans+Pro"
rel="stylesheet"
/>
<style>{`
html {
font-family: "Source Sans Pro", sans-serif;
}
`}</style>
</head>
<body>
<p>Test.</p>
</body>
</html>
)); This outputs: <html><head><link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro" rel="stylesheet"/><style>
html {
font-family: "Source Sans Pro", sans-serif;
}
</style></head><body><p>Test.</p></body></html> By the parsing rules in the HTML spec (I'm consulting WHATWG here), the contents of elements Escaping the contents of |
@andreubotella This is a different problem, you should use |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. |
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you! |
This is not a bug in React. Using an entity reference for
In the HTML spec you do not need to use a character reference for The example they give is: <a href="?bill&ted">Bill and Ted</a> <!-- &ted is ok, since it's not a named character reference -->
<a href="?art&copy">Art and Copy</a> <!-- the & has to be escaped, since © is a named character reference --> Personally, I feel like React made the right call with escaping |
In meta tags escaped paths don't work... otherwise, this bug would not have be opened. |
This is the change needed to get the behavior you expect: Replace
with an escape hatch: if (tagVerbatim === 'meta' && propKey === 'content') {
markup = 'content="' + propValue + '"';
} else {
markup = createMarkupForProperty(propKey, propValue);
} This would explicitly exempt the A more generic solution would involve having something like <span
dangerouslySetAttributes={{__attributes: [{name: 'data-src', value: '&'}]}}
/> This could easily lead to parsing errors and unexpected results if any value after the Again, the issue was with the HTML parser FB was using for the Sharing Debugger, not React. It is properly parsing the escaped paths now. |
parameters in the meta file It's a known React bug, but the server-side rendering can't handle query parameters in the meta head file; it turns `&`s into `&`s, etc. To bypass that, we've added a custom share image field to the content model, and require that to be under 1MB so we don't have to handle it here. You *can* run an Express server using Next JS that will handle these HTML entities and decode them, but Vercel (the host) won't allow that, and also provides some custom optimization that other hosts wouldn't, out-of-the-box. So, decided to not go that route. Some open github issues, for reference: - vercel/next.js#2006 - facebook/react#13838 - An example Express server fix: https://gist.github.com/stefl/1f8c246dd7ca9cb332ae41f68e80088d
@equinusocio Anything reading the meta tags from the HTML should handle decoding the HTML encoded characters that React produces here. It's a bug with anything consuming the HTML if it doesn't decode the |
Hello, I'm having this problem and can't find any fix. I'm on React 17.0.1. Would anyone have a hint ? |
I’ve spent the last little while trying to figure out this issue since the encoded URLs cause LinkedIn to not see Open Graph images. I’ve come up with a workaround for our Gatsby site that uses const headHtml = ReactDOMServer.renderToStaticMarkup(headComponents).replace(
/&/g,
'&'
)
return (
<html>
<head dangerouslySetInnerHTML={{ __html: headHtml }} />
... And here’s the full html.js for a Gatsby project: https://github.com/notsidney/gatsby-meta-encoded-url-workaround/blob/main/src/html.js#L6-L25 |
Do I understand correctly that the issue is primarily due to the content scrapers that don’t consider encoded characters? So the React output is technically correct but some scrapers are too naïve? Or is there a reason why React behavior is technically incorrect? |
Yes and no. For example Inspired by vercel/next.js#2006 (comment) In the Unfortunately there is no easy solution ( |
"No" to which part?
I understand that particular services may not work. What I'm asking is — technically would you agree that the problem is on their end, or is the problem unambiguously on our end? I don't mean to turn this into "not our problem, bye" kind of argument, but I think we need to get clarity on the ideal situation before we can move forward with any plan here. So is React producing technically correct output (and resoc's parser is wrong) or is React producing technically incorrect output? I understand this question may not be important to your use case (it doesn't work either way) but it is important to me.
I don't know what you mean by this. Next.JS is using React so it's expected that it would have the same behavior. You mention "same behavior" after resoc.io which makes it sound like you're saying Next.JS is also incorrectly parsing something, but Next.JS is producing, not consuming, and it's producing via React, so I'm not sure if there's any extra meaning I'm missing here. I'd expect Next.JS to not be relevant in this issue because the issue is about React behavior itself, and discussing React alone should be enough to figure out the path forward here. But maybe I missed something!
OK so maybe this is what you mean by "for Next.JS". I think you're saying: "when the URL is encoded, the request received by my API built with Next.JS contains incorrect parameter". However, this assertion is missing an important detail: what exactly is sending a request to your Next.JS API? Is this Facebook? Twitter? LinkedIn? resoc.io? This is important — I need to understand which case exactly you are trying to solve, if there is a concrete one.
It is unclear which libraries you're referring to here. Next.JS? I believe Next.JS's behavior is correct here — if the request itself has an encoded string, Next.JS isn't supposed to unencode it. I believe unencoding should've been done by whatever thing that parsed your meta tag. This is why I'm asking what is the exact service you're trying to make this work with. And whether the same problem exists for other services now. E.g. if FB and Twitter treat this correctly, I think this could be a strong enough argument that you should report the problem to the faulty service instead, and have them fix it. |
In my view it's just naïve scrapers, some if which are run by Facebook, see #6873 (comment) |
fwiw afaict this makes you unable to write inline scripts in NextJS. |
Whilst React's approach might be technically correct, it doesn't work in the real world, because we have naive scrapers, from massive organisations like LinkedIn, Slack and WhatsApp. So you can choose to be technically correct and pedantic or to have your code work, in the real world, right now. I for one choose the latter. I'm trying to get open graph images with query params to work, now I need to look for a hacky workaround because I can't go and force these massive tech organisations to "fix" their code. |
I'm wondering if most of the people who believe this is an issue for themselves are mistaken. I first found this issue because I was trying to figure out why Twitter wasn't showing an image from one of my I also wonder if some people are improperly HTML-encoding some strings before giving them to React and then thinking the double-encoding is caused by React doing HTML-encoding. People of both of these situations are present in the related thread #6873, so it seems likely that at least a decent portion of people in this thread have the same confusions. Before suggesting putting workarounds in React for other services' parsers, we should try to be sure the problem is not one of these other dev-side confusions, we should figure out a list of services we're sure have the issue, we should try to report the issue to them first, and only then move onto figuring out workarounds if the list of services with problems is still large. Several people have suggested that LinkedIn is unable to handle properly HTML-encoded meta tags, but from my tests they seem to handle things fine. I created this test HTML page: <!DOCTYPE html>
<html>
<head>
<title>test page</title>
<meta property="og:title" content="Title of the article" />
<meta property="og:image" content="/1234567.png?a=b&c=e" />
<meta
property="og:description"
content="Description that will show in the preview"
/>
</head>
<body>
this is a test page
</body>
</html> The HTML for the meta og:image tag has I made a directory, put that file as "index.html" in it, put a random image named "1234567.png" in the directory too, I ran My local HTTP server's logs immediately showed the following, confirming that it correctly decoded:
Additionally, the LinkedIn Post Inspector page correctly showed the detected image URL as It seems like LinkedIn is handling things correctly and working with the standard HTML-encoding behavior React is following, and that no change or workaround in React is necessary to work here with them. If there are other parts of LinkedIn that do have an issue, then the issue could be demonstrated with a test like this. |
Same issue, using |
@apecollector I had the similar experience with _.unescape('&query=value'); outputs
|
Hi all, I know this is a 4 year old issue and probably most of you have probably moved on, but I'm new to it and feel I must be missing something obvious. I suspect we're all adding content security policies to our React apps by now, and those policies have a
becomes
What's the recommended way to do this in 2022? |
Hi, I have the same issue as @sc0ttdav3y has, but in my case single quotes are escaping with |
(I just tried the same thing with NextJS and the exact output I got is @sc0ttdav3y @devilportez It should be fine that the value is html-encoded like that. The browser will decode html entities when reading the html and interpret the value of the meta tag the same way as |
Still happening in Kind of killing my SEO because I can't use: <meta property="og:image" content="url-with-&" /> |
I ran into this escaping issue and tried a hack from another issue I had also ran into. It seemed to have fixed the escaping for me. You get a few unnecessary
|
Wound up here investigating why a link-preview was missing its preview image on Twitter. The escaping turned out to be a red herring; Twitter's link-preview bot handles escaped URLs just fine. The problem was that it wasn't using our own mirror of the image, it was using a user-submitted image URL on |
Running into this issue as well using Next.js and |
Issue with what? The consumer of this meta tag not parsing escaped HTML values correctly? |
Do you want to request a feature or report a bug?
I'm guessing it's a bug.
What is the current behavior?
The following source code,
, is being escaped once server side rendered:
You can reproduce the behavior like this:
editor: https://codesandbox.io/s/my299jk7qp
output : https://my299jk7qp.sse.codesandbox.io/
What is the expected behavior?
I would expect the content not being escaped. It's related to vercel/next.js#2006 (comment).
I'm using the
og:image
meta element so my pages can have nice previews within Facebook :).Which versions of React, and which browser / OS are affected by this issue? Did this work in previous versions of React?
16.5.2
The text was updated successfully, but these errors were encountered: