Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code blocks should be editable after saving the post and reloading the editor #15780

Closed
wants to merge 2 commits into from

Conversation

davilera
Copy link
Contributor

@davilera davilera commented May 22, 2019

Description

See #13218.

How has this been tested?

Created the block described in #13218.

Types of changes

The PR solves the issue by making sure the content of a code block is pulled as HTML during the parsing process and escaping/unescaping tag delimiters (i.e. < and >) properly.

Unfortunately, this seems to be (yet another) breaking change, as it requires < and > to be escaped before saving. Edit: 1ac7cfa prevents this PR from being a breaking change.

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.

@aduth
Copy link
Member

aduth commented May 22, 2019

Unfortunately, this seems to be (yet another) breaking change, as it requires < and > to be escaped before saving.

Can you elaborate on what specifically is breaking? Will existing content be shown as "Invalid"?

@davilera
Copy link
Contributor Author

Can you elaborate on what specifically is breaking? Will existing content be shown as "Invalid"?

I think it would, yes. However, there's an option to make a block compatible with previous versions, right? If so, I can explore this...

@aduth
Copy link
Member

aduth commented May 22, 2019

It'd be worth testing to see if it is an issue. The block validation logic explicitly includes some leniency where differences are purely on differences in encoding (source), so it might not be a problem.

If it does turn out to be breaking, we can try to use the deprecated feature for transitioning (reference.

…actually escape them (if anybody does, it's Gutenberg itself, implicitly, when saving)
@davilera
Copy link
Contributor Author

OK so... I think I got this (without introducing breaking changes!). The problem was, Gutenberg (or the browser/JavaScript itself, I don't know) escapes "less than" characters (<) when the block is saved in the database. That is, something like #13218:

<a href="http://wordpress.org/">WordPress</a>  &lt;strong&gt;

becomes this (after my original PR #13996 is applied):

&lt;a href="http://wordpress.org/">WordPress&lt;/a>  &amp;lt;strong&amp;gt;

The problem is, when loading that block from the post content, characters like &lt; aren't properly parsed and the problems described in #13218 arise. I tweaked the block so that:

  1. Content is retrieved in HTML, so that the block can work with HTML entities.
  2. The unescape function we use to set the value in the PlainText component also unescapes &lt; and &gt; characters, which might be there because of the save process.

Apparently, this seems to solve issue #13218 and all tests pass. I would recommend a few more testing, though.

@davilera
Copy link
Contributor Author

Were you able to review this, @aduth? Is there anything else I can do to help?

@swissspidy swissspidy added the [Package] Block library /packages/block-library label May 28, 2019
@aduth
Copy link
Member

aduth commented May 28, 2019

2. The unescape function we use to set the value in the PlainText component also unescapes &lt; and &gt; characters, which might be there because of the save process.

Yep, wondering about why we wouldn't need to have a corresponding escaper in the setter, it occurs to me that this is already effectively escaped when the block is serialized. So I think the change is sensible in restoring a balance. It's also better for it to occur during the save than in escape since if we were to escape the value as maintained in the block attributes, it would wrongly be displayed as escaped in the PlainText textarea.

I'd wondered about whether we really need to unescape here, vs. setting the innerHTML of the textarea directly, since this is a common technique for "safely" decoding entities. That said, I'm inclined to avoid dangerouslySetInnerHTML as much as possible for reasons which should be self-evident by its name 😄 My only concern then is if there are other encoded entities which we need to be considering.

This looks good to me. I'll want to give it another pass in the morning with a fresher mind.

@aduth aduth added the [Block] Code Affects the Code Block label May 29, 2019
@aduth
Copy link
Member

aduth commented May 29, 2019

Yep, wondering about why we wouldn't need to have a corresponding escaper in the setter, it occurs to me that this is already effectively escaped when the block is serialized. So I think the change is sensible in restoring a balance.

In further confirming this, I expect it should be effectively the exact opposite of:

export function escapeHTML( value ) {
return escapeLessThan( escapeAmpersand( value ) );
}

There are two observations:

  • We need ampersand unescaping. This already existed prior to the pull request.
  • We technically don't need "greater than" unescaping. I expect it should cause no harm to keep it, however, since this unescaping is only meant to account for escaping we only expect to occur from the above serialization logic anyways.

I'd also wondered about server-side escaping we might need to anticipate, and specifically in cases of unfiltered_html for non-privileged users. In investigating how this is applied, it seems most relevant for tag-stripping and not for character escaping (reference).

I expect we'd used source: 'text' previously as a naive way to achieve this reverse behavior of the serialization, but since innerText will unescape much more than what is escaped by the serializer, it would result in the unexpected breakage described in situations like of #13218.

@aduth
Copy link
Member

aduth commented May 29, 2019

@davilera I have some small concern about trying to keep this in sync with the specific implementation of the serializer, and I'm curious your thoughts on an alternative I have in mind:

The idea would be to change the source to 'html' (like as proposed) and use dangerouslySetInnerHTML in the save implementation to bypass the serializer escaping. To avoid having characters shown as escaped in the textarea, we would need to either use dangerouslySetInnerHTML in the edit implementation (which is a bit less "safe" in my mind than in save), or another way to decode the entities before assigning the value. There is a @wordpress/html-entities package for this purpose (its implementation is admittedly not far from as the alternative). The attribute value would need to be maintained in its escaped form, so as to align with the new save implementation. We could use the textarea's innerHTML for this, which as far as I can tell will take care of this.

var e = document.createElement( 'textarea' ); e.textContent = '&'; console.log( e.innerHTML );
// '&amp;'

It seems this alternative might pose some more risk in trying to assure safety of value escaping, so I may be comfortable as well with the current proposal, even if it needs to account for factors outside its own consideration (the serializer behavior).

@davilera
Copy link
Contributor Author

davilera commented May 30, 2019

To avoid having characters shown as escaped in the textarea, we would need to either use dangerouslySetInnerHTML in the edit implementation (which is a bit less "safe" in my mind than in save), or another way to decode the entities before assigning the value.

To be honest, I don't have a strong opinion on any solution, as none is perfect and each poses its own problems. That said, I feel like using dangerouslySetInnerHTML is slightly worse (I'm not sure we'd be able to control what it does).

Regarding the current proposal, I also don't like using an asymmetric escape and unescape functions in utils.js because, as @aduth said, “it needs to account for factors outside its own consideration (the serializer behavior)”. If we make it symmetric, the block should work (I think), but we'd be introducing a breaking change (as we'd be changing what's saved in the DB).

What do you think, @aduth? Should we introduce a breaking change? I think the risk the current proposal poses is minimal: in principle, our block would only receive a &lt; character in content if somebody else (e.g. the serializer) inserted it. If it was the user who wrote the &lt; entity, this would be saved as &amp;lt; (because of our escape method), which means things should work as expected. But I'd rather have a self-contained solution... Complicated decisions here!

@aduth aduth mentioned this pull request Nov 6, 2019
5 tasks
@davilera
Copy link
Contributor Author

If I'm not mistaken, the issue this PR addressed has been fixed by @ellatrix in #17994, so I'm closing it.

@davilera davilera closed this Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Block] Code Affects the Code Block [Package] Block library /packages/block-library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants