-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintain tag attribute quote characters #720
Comments
This definitely won't break JSON within browsers. IMHO this won't be fixed. |
@fb55 is there any chance of this being fixed? The html my app has to consume is very horribly written and I do not have control over fixing it. Not only does it contain raw json inside of a div tag, but most of the |
+1, I also store JSON in HTML attributes and would like to second keeping single quoted escapes because it makes the html with embedded json much easier to manipulate and read |
+1 |
gentlemen any update on this one? do you intend to fix this or not at all? its more that 1 year later and task is still open. |
@fb55, any update for this? Problem is really important, can't store json in meta tags there. |
Switching to parse5 could fix this (it's another open issue). As I said – Felix |
ok @fb55, thank you very much for the feedback. |
I fix this.
else {
output += key + "='" + (opts.decodeEntities ? entities.encodeXML(value) : value) + "'";
} To else {
if(/[^\\]\"/.test(value)){
output += key + "='" + (opts.decodeEntities ? entities.encodeXML(value) : value) + "'";
}else {
output += key + '="' + (opts.decodeEntities ? entities.encodeXML(value) : value) + '"';
}
} |
Thanks, @gaecom.
This variant works with |
I also have this issue and the proposed solutions doesn't works for my use case. The HTML parsed by cheerio (via Inky) will also by parsed later by Twig and the quote needs to remain the same. Here is a HTML sample which fail:
|
We are having this issue with implementing the new AMP pages by google. One of the parameters requires JSON inside a data attribute like so:
The single quotes get converted to doubles which invalidates the HTML. JSON can't contain single quotes so swapping double and single quotes doesn't work. Please cheerio, we need you. |
SOS. Does anyone have a fix for this without modifying external node_modules? |
I just had this problem and fixed it by adding
|
…ds (#19727) closes ENG-627 We were using `cheerio` to parse+modify+serialize our rendered HTML to modify links for member attribution. Cheerio's serializer has a [long-standing issue](cheeriojs/cheerio#720) (that we've [had to deal with before](TryGhost/SDK#124)) where it replaces single-quote attributes with double-quote attributes. That was resulting in broken rendering when content used single-quotes such as in HTML cards that have JSON data inside a `data-` attribute or otherwise used single-quotes to avoid escaping double-quotes in an attribute value. - swapped the implementation that uses `cheerio` for one that uses `html5parser` to tokenize the html string, from there we can loop over the tokens and replace the href attribute values in the original string without touching any other part of the content. Avoids a full parse+serialize process which is both more costly and can result unexpected content changes due to serializer opinions. - fixes the quote change bug - uses tokenization directly to avoid cost of building a full AST - updated Content API Posts snapshot - one of our fixtures has a missing closing tag which we're no longer "fixing" with a full parse+serialize step in the link replacer (keeps modified src closer to original and better matches behaviour elsewhere in the app / without member-attribution applied) - the link replacer no longer converts `attr=""` to `attr` (these are equivalent in the HTML spec so no change in behaviour other than preserving the original source html) - added a benchmark test file comparing the two implementations because the link replacer runs on render so it's used in a hot path - new implementation has a 3x performance improvement - the separate files with the old/new implementations have been cleaned up but I've left the benchmark test file in place for future reference Benchmark results comparing implementations: ``` ❯ node test/benchmark.js LinkReplacer ├─ cheerio: 5.03K /s ±2.20% ├─ html5parser: 16.5K /s ±0.43% Completed benchmark in 0.9976526670455933s ┌─────────────┬─────────┬────────────┬─────────┬───────┐ │ (index) │ percent │ iterations │ current │ max │ ├─────────────┼─────────┼────────────┼─────────┼───────┤ │ cheerio │ '' │ '5.03K/s' │ 5037 │ 5037 │ │ html5parser │ '' │ '16.5K/s' │ 16534 │ 16534 │ └─────────────┴─────────┴────────────┴─────────┴───────┘ ```
…ds (#19727) closes ENG-627 We were using `cheerio` to parse+modify+serialize our rendered HTML to modify links for member attribution. Cheerio's serializer has a [long-standing issue](cheeriojs/cheerio#720) (that we've [had to deal with before](TryGhost/SDK#124)) where it replaces single-quote attributes with double-quote attributes. That was resulting in broken rendering when content used single-quotes such as in HTML cards that have JSON data inside a `data-` attribute or otherwise used single-quotes to avoid escaping double-quotes in an attribute value. - swapped the implementation that uses `cheerio` for one that uses `html5parser` to tokenize the html string, from there we can loop over the tokens and replace the href attribute values in the original string without touching any other part of the content. Avoids a full parse+serialize process which is both more costly and can result unexpected content changes due to serializer opinions. - fixes the quote change bug - uses tokenization directly to avoid cost of building a full AST - updated Content API Posts snapshot - one of our fixtures has a missing closing tag which we're no longer "fixing" with a full parse+serialize step in the link replacer (keeps modified src closer to original and better matches behaviour elsewhere in the app / without member-attribution applied) - the link replacer no longer converts `attr=""` to `attr` (these are equivalent in the HTML spec so no change in behaviour other than preserving the original source html) - added a benchmark test file comparing the two implementations because the link replacer runs on render so it's used in a hot path - new implementation has a 3x performance improvement - the separate files with the old/new implementations have been cleaned up but I've left the benchmark test file in place for future reference Benchmark results comparing implementations: ``` ❯ node test/benchmark.js LinkReplacer ├─ cheerio: 5.03K /s ±2.20% ├─ html5parser: 16.5K /s ±0.43% Completed benchmark in 0.9976526670455933s ┌─────────────┬─────────┬────────────┬─────────┬───────┐ │ (index) │ percent │ iterations │ current │ max │ ├─────────────┼─────────┼────────────┼─────────┼───────┤ │ cheerio │ '' │ '5.03K/s' │ 5037 │ 5037 │ │ html5parser │ '' │ '16.5K/s' │ 16534 │ 16534 │ └─────────────┴─────────┴────────────┴─────────┴───────┘ ```
Cheerio changes attributes with single quotes into double quotes.
This is useful for me, as I use JSON in HTML attributes for widget settings.
Which is encoded with HTML entities (possibly breaking JSON) as
Setting
decodeEntities: false
is encoded and breaks HTMLIdeally, cheerio would preserve which quote character is used. I understand this is an edge case, so I'm reporting it in case others run into it. Similar to #460
The text was updated successfully, but these errors were encountered: