Skip to content

Commit

Permalink
Parser: Add new list of HTML fragments to parse output (#11334)
Browse files Browse the repository at this point in the history
Attempt three at including positional information from the parse to enable isomorphic reconstruction of the source `post_content` after parsing.

See alternate attempts: #11082, #11309
Motivated by: #7247, #8760, Automattic/jetpack#10256
Enables: #10463, #10108

## Abstract

Add new `innerContent` property to each block in parser output indicating where in the innerHTML each innerBlock was found.

## Status

 - will update fixtures after design review indicates this is the desired approach
 - all parsers passing new tests for fragment behavior

## Summary

Inner blocks, or nested blocks, or blocks-within-blocks, can exist in Gutenberg posts. They are serialized in `post_content` in place as normal blocks which exist in between another block's comment delimiters.

```html
<!-- wp:outerBlock -->
Check out my
<!-- wp:voidInnerBlock /-->
and my other
<!-- wp:innerBlock -->
with its own content.
<!-- /wp:innerBlock -->
<!-- /wp:outerBlock -->
```

The way this gets parsed leaves us in a quandary: we cannot reconstruct the original `post_content` after parsing because we lose the origin location information for each inner block since they are only passed as an array of inner blocks.

```json
{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n"
}
```

At this point we have parsed the blocks and prepared them for attaching into the JavaScript block code that interprets them but we have lost our reverse transformation.

In this PR I'd like to introduce a new mechanism which shouldn't break existing functionality but which will enable us to go back and forth isomorphically between the `post_content` and first stage of parsing. If we can tear apart a Gutenberg post and reassemble then it will let us to structurally-informed processing of the posts without needing to be aware of all the block JavaScript.

The proposed mechanism is a new property as a **list of HTML fragments with `null` values interspersed between those fragments where the blocks were found**.

```json
{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n",
	"innerContent": [ "\nCheck out my\n", null, "\n and my other\n", null, "\n" ],
}
```

Doing this allows us to replace those `null` values with their associated block (sequentially) from `innerBlocks`.

## Questions

 - Why not use a string token instead of an array?
    - See #11309. The fundamental problem with the token is that it could be valid content input from a person and so there's a probability that we would fail to split the content accurately.

 - Why add the `null` instead of leaving basic array splits like `[ 'before', 'after' ]`?
    - By inspection we can see that without an explicit marker we don't know if the block came before or after or between array elements. We could add empty strings `''` and say that blocks exist only _between_ array elements but the parser code would have to be more complicated to make sure we appropriately add those empty strings. The empty strings are a bit odd anyway.

 - Why add a new property?
    - Code already depends on `innerHTML` and `innerBlocks`; I don't want to break any existing behaviors and adding is less risky than changing.
  • Loading branch information
dmsnell authored Nov 7, 2018
1 parent 8cbfb3c commit 1014389
Show file tree
Hide file tree
Showing 69 changed files with 1,017 additions and 309 deletions.
260 changes: 205 additions & 55 deletions lib/parser.php
Original file line number Diff line number Diff line change
Expand Up @@ -259,20 +259,22 @@ private function peg_f1($pre, $bs, $post) { return peg_join_blocks( $pre, $bs, $
private function peg_f2($blockName, $a) { return $a; }
private function peg_f3($blockName, $attrs) {
return array(
'blockName' => $blockName,
'attrs' => isset( $attrs ) ? $attrs : array(),
'innerBlocks' => array(),
'innerHTML' => '',
'blockName' => $blockName,
'attrs' => isset( $attrs ) ? $attrs : array(),
'innerBlocks' => array(),
'innerHTML' => '',
'innerContent' => array(),
);
}
private function peg_f4($s, $children, $e) {
list( $innerHTML, $innerBlocks ) = peg_array_partition( $children, 'is_string' );
list( $innerHTML, $innerBlocks, $innerContent ) = peg_process_inner_content( $children );

return array(
'blockName' => $s['blockName'],
'attrs' => $s['attrs'],
'innerBlocks' => $innerBlocks,
'innerHTML' => implode( '', $innerHTML ),
'innerHTML' => $innerHTML,
'innerContent' => $innerContent,
);
}
private function peg_f5($blockName, $attrs) {
Expand Down Expand Up @@ -711,36 +713,106 @@ private function peg_parseBlock_Balanced() {
$s3 = $this->peg_parseBlock();
if ($s3 === $this->peg_FAILED) {
$s3 = $this->peg_currPos;
$s4 = $this->peg_currPos;
$s4 = array();
$s5 = $this->peg_currPos;
$s6 = $this->peg_currPos;
$this->peg_silentFails++;
$s6 = $this->peg_parseBlock_End();
$s7 = $this->peg_parseBlock();
$this->peg_silentFails--;
if ($s6 === $this->peg_FAILED) {
$s5 = null;
if ($s7 === $this->peg_FAILED) {
$s6 = null;
} else {
$this->peg_currPos = $s6;
$s6 = $this->peg_FAILED;
}
if ($s6 !== $this->peg_FAILED) {
$s7 = $this->peg_currPos;
$this->peg_silentFails++;
$s8 = $this->peg_parseBlock_End();
$this->peg_silentFails--;
if ($s8 === $this->peg_FAILED) {
$s7 = null;
} else {
$this->peg_currPos = $s7;
$s7 = $this->peg_FAILED;
}
if ($s7 !== $this->peg_FAILED) {
if ($this->input_length > $this->peg_currPos) {
$s8 = $this->input_substr($this->peg_currPos, 1);
$this->peg_currPos++;
} else {
$s8 = $this->peg_FAILED;
if ($this->peg_silentFails === 0) {
$this->peg_fail($this->peg_c0);
}
}
if ($s8 !== $this->peg_FAILED) {
$s6 = array($s6, $s7, $s8);
$s5 = $s6;
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
if ($s5 !== $this->peg_FAILED) {
if ($this->input_length > $this->peg_currPos) {
$s6 = $this->input_substr($this->peg_currPos, 1);
$this->peg_currPos++;
} else {
$s6 = $this->peg_FAILED;
if ($this->peg_silentFails === 0) {
$this->peg_fail($this->peg_c0);
while ($s5 !== $this->peg_FAILED) {
$s4[] = $s5;
$s5 = $this->peg_currPos;
$s6 = $this->peg_currPos;
$this->peg_silentFails++;
$s7 = $this->peg_parseBlock();
$this->peg_silentFails--;
if ($s7 === $this->peg_FAILED) {
$s6 = null;
} else {
$this->peg_currPos = $s6;
$s6 = $this->peg_FAILED;
}
if ($s6 !== $this->peg_FAILED) {
$s7 = $this->peg_currPos;
$this->peg_silentFails++;
$s8 = $this->peg_parseBlock_End();
$this->peg_silentFails--;
if ($s8 === $this->peg_FAILED) {
$s7 = null;
} else {
$this->peg_currPos = $s7;
$s7 = $this->peg_FAILED;
}
if ($s7 !== $this->peg_FAILED) {
if ($this->input_length > $this->peg_currPos) {
$s8 = $this->input_substr($this->peg_currPos, 1);
$this->peg_currPos++;
} else {
$s8 = $this->peg_FAILED;
if ($this->peg_silentFails === 0) {
$this->peg_fail($this->peg_c0);
}
}
if ($s8 !== $this->peg_FAILED) {
$s6 = array($s6, $s7, $s8);
$s5 = $s6;
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
}
if ($s6 !== $this->peg_FAILED) {
$s5 = array($s5, $s6);
$s4 = $s5;
} else {
$this->peg_currPos = $s4;
$s4 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s4;
$s4 = $this->peg_FAILED;
}
if ($s4 !== $this->peg_FAILED) {
Expand All @@ -754,36 +826,106 @@ private function peg_parseBlock_Balanced() {
$s3 = $this->peg_parseBlock();
if ($s3 === $this->peg_FAILED) {
$s3 = $this->peg_currPos;
$s4 = $this->peg_currPos;
$s4 = array();
$s5 = $this->peg_currPos;
$s6 = $this->peg_currPos;
$this->peg_silentFails++;
$s6 = $this->peg_parseBlock_End();
$s7 = $this->peg_parseBlock();
$this->peg_silentFails--;
if ($s6 === $this->peg_FAILED) {
$s5 = null;
if ($s7 === $this->peg_FAILED) {
$s6 = null;
} else {
$this->peg_currPos = $s6;
$s6 = $this->peg_FAILED;
}
if ($s6 !== $this->peg_FAILED) {
$s7 = $this->peg_currPos;
$this->peg_silentFails++;
$s8 = $this->peg_parseBlock_End();
$this->peg_silentFails--;
if ($s8 === $this->peg_FAILED) {
$s7 = null;
} else {
$this->peg_currPos = $s7;
$s7 = $this->peg_FAILED;
}
if ($s7 !== $this->peg_FAILED) {
if ($this->input_length > $this->peg_currPos) {
$s8 = $this->input_substr($this->peg_currPos, 1);
$this->peg_currPos++;
} else {
$s8 = $this->peg_FAILED;
if ($this->peg_silentFails === 0) {
$this->peg_fail($this->peg_c0);
}
}
if ($s8 !== $this->peg_FAILED) {
$s6 = array($s6, $s7, $s8);
$s5 = $s6;
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
if ($s5 !== $this->peg_FAILED) {
if ($this->input_length > $this->peg_currPos) {
$s6 = $this->input_substr($this->peg_currPos, 1);
$this->peg_currPos++;
} else {
$s6 = $this->peg_FAILED;
if ($this->peg_silentFails === 0) {
$this->peg_fail($this->peg_c0);
while ($s5 !== $this->peg_FAILED) {
$s4[] = $s5;
$s5 = $this->peg_currPos;
$s6 = $this->peg_currPos;
$this->peg_silentFails++;
$s7 = $this->peg_parseBlock();
$this->peg_silentFails--;
if ($s7 === $this->peg_FAILED) {
$s6 = null;
} else {
$this->peg_currPos = $s6;
$s6 = $this->peg_FAILED;
}
if ($s6 !== $this->peg_FAILED) {
$s7 = $this->peg_currPos;
$this->peg_silentFails++;
$s8 = $this->peg_parseBlock_End();
$this->peg_silentFails--;
if ($s8 === $this->peg_FAILED) {
$s7 = null;
} else {
$this->peg_currPos = $s7;
$s7 = $this->peg_FAILED;
}
if ($s7 !== $this->peg_FAILED) {
if ($this->input_length > $this->peg_currPos) {
$s8 = $this->input_substr($this->peg_currPos, 1);
$this->peg_currPos++;
} else {
$s8 = $this->peg_FAILED;
if ($this->peg_silentFails === 0) {
$this->peg_fail($this->peg_c0);
}
}
if ($s8 !== $this->peg_FAILED) {
$s6 = array($s6, $s7, $s8);
$s5 = $s6;
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s5;
$s5 = $this->peg_FAILED;
}
}
if ($s6 !== $this->peg_FAILED) {
$s5 = array($s5, $s6);
$s4 = $s5;
} else {
$this->peg_currPos = $s4;
$s4 = $this->peg_FAILED;
}
} else {
$this->peg_currPos = $s4;
$s4 = $this->peg_FAILED;
}
if ($s4 !== $this->peg_FAILED) {
Expand Down Expand Up @@ -1441,18 +1583,23 @@ public function parse($input) {
// are the same as `json_decode`

// array arguments are backwards because of PHP
if ( ! function_exists( 'peg_array_partition' ) ) {
function peg_array_partition( $array, $predicate ) {
$truthy = array();
$falsey = array();
if ( ! function_exists( 'peg_process_inner_content' ) ) {
function peg_process_inner_content( $array ) {
$html = '';
$blocks = array();
$content = array();

foreach ( $array as $item ) {
call_user_func( $predicate, $item )
? $truthy[] = $item
: $falsey[] = $item;
if ( is_string( $item ) ) {
$html .= $item;
$content[] = $item;
} else {
$blocks[] = $item;
$content[] = null;
}
}

return array( $truthy, $falsey );
return array( $html, $blocks, $content );
}
}

Expand All @@ -1465,7 +1612,8 @@ function peg_join_blocks( $pre, $tokens, $post ) {
'blockName' => null,
'attrs' => array(),
'innerBlocks' => array(),
'innerHTML' => $pre
'innerHTML' => $pre,
'innerContent' => array( $pre ),
);
}

Expand All @@ -1479,7 +1627,8 @@ function peg_join_blocks( $pre, $tokens, $post ) {
'blockName' => null,
'attrs' => array(),
'innerBlocks' => array(),
'innerHTML' => $html
'innerHTML' => $html,
'innerContent' => array( $html ),
);
}
}
Expand All @@ -1489,7 +1638,8 @@ function peg_join_blocks( $pre, $tokens, $post ) {
'blockName' => null,
'attrs' => array(),
'innerBlocks' => array(),
'innerHTML' => $post
'innerHTML' => $post,
'innerContent' => array( $post ),
);
}

Expand Down
Loading

0 comments on commit 1014389

Please sign in to comment.