feat: wonky rollups (#7189)

### Wonky Rollups See full write up [here](https://hackmd.io/@aztec-network/ByROkLKIC). This PR eliminates the need to pad empty transactions in a rollup, so we can run the fewest rollup circuits required for a block. Main changes are: - Implementing a greedy-filled wonky tree in `sol` (`computeUnbalancedRoot`) and `ts` (`unbalanced_tree.ts`, and `getTxsEffectHash`) - thanks to @alexghr for the idea and help on this! - Using this tree to calculate `txsEffectHash`, `outHash`, and gather membership paths for consuming messages in the `outHash` tree - `ContentCommitment.txTreeHeight` -> `.numTxs` as we have variable height (`numTxs` may also not be required, as we gather the number from the txs effect calculation anyway) - Merge rollups can now take in one base and one merge rollup as input - `orchestrator.ts` now forms a wonky tree (`proving-state.ts -> findMergeLevel`) instead of a balanced tree with padding when constructing the rollup circuits and enqueuing proofs - We *only* pad blocks if we have less than 2 transactions, since a root rollup still needs 2 inputs --- The tree greedy fills txs from L to R in a rollup structure e.g. 5 txs looks like: ``` // root // / \ // merge base // / \ // merge merge // / \ / \ // base base base base ``` ...and 7 txs looks like: ``` // root // / \ // merge3 merge5 // / \ / \ // merge1 merge2 merge4 base // / \ / \ / \ // base base base base base base ``` Eliminates the need to use padding txs and circuits. E.g. previously, 5txs would have: ``` // // root // / \ // merge merge // / \ / \ // merge merge merge merge // / \ / \ / \ / \ // base base base base base <pad> <pad> <pad> ``` Processing 3 extra txs and => base circuits, and 3 extra merge circuits. To recalulate a wonky tree from the number of txs in a block, simply decompose it into powers of 2 to find the subtree sizes. Calculate the roots of these subtrees and hash from right to left. A full example in the write up.
AztecProtocol · Jul 1, 2024 · 1de3746 · 1de3746
1 parent 3e6d88e
commit 1de3746
Show file tree

Hide file tree

Showing 48 changed files with 1,543 additions and 468 deletions.
diff --git a/docs/docs/protocol-specs/logs/index.md b/docs/docs/protocol-specs/logs/index.md
@@ -53,161 +53,27 @@ A function can emit an arbitrary number of logs, provided they don't exceed the
 
 <!-- TODO: there might be length extension attacks with this approach. We might need to encode the length of each log into the accumulated logs hash, rather than sum the lengths separately. Otherwise, there might be a way to pretend a message of length 7 was actually two messages of length 3 and 4 (for example) -->
 
-To minimize the on-chain verification data size, protocol circuits aggregate log hashes. The end result is a single hash within the root rollup proof, encompassing all logs of the same type.
+To minimize the on-chain verification data size, protocol circuits aggregate log hashes. The end result is a single hash within the base rollup proof, encompassing all logs of the same type.
 
 Each protocol circuit outputs two values for each log type:
 
 - _`accumulated_logs_hash`_: A hash representing all logs.
 - _`accumulated_logs_length`_: The total length of all log preimages.
 
+Both the `accumulated_logs_hash` and `accumulated_logs_length` for each type are included in the base rollup's `txs_effect_hash`. When rolling up to merge and root circuits, the two input proof's `txs_effect_hash`es are hashed together to form the new value of `txs_effect_hash`.
+
+When publishing a block on L1, the raw logs of each type and their lengths are provided (**Availability**), hashed and accumulated into each respective `accumulated_logs_hash` and `accumulated_logs_length`, then included in the on-chain recalculation of `txs_effect_hash`. If this value doesn't match the one from the rollup circuits, the block will not be valid (**Immutability**).
+
+<!-- 
 In cases where two proofs are combined to form a single proof, the _accumulated_logs_hash_ and _accumulated_logs_length_ from the two child proofs must be merged into one accumulated value:
 
 - _`accumulated_logs_hash = hash(proof_0.accumulated_logs_hash, proof_1.accumulated_logs_hash)`_
   - If either hash is zero, the new hash will be _`proof_0.accumulated_logs_hash | proof_1.accumulated_logs_hash`_.
-- _`accumulated_logs_length = proof_0.accumulated_logs_length + proof_1.accumulated_logs_length`_
+- _`accumulated_logs_length = proof_0.accumulated_logs_length + proof_1.accumulated_logs_length`_ 
+-->
 
 For private and public kernel circuits, beyond aggregating logs from a function call, they ensure that the contract's address emitting the logs is linked to the _logs_hash_. For more details, refer to the "Hashing" sections in [Unencrypted Log](#hashing-1), [Encrypted Log](#hashing-2), and [Encrypted Note Preimage](#hashing-3).
 
-### Encoding
-
-1. The encoded logs data of a transaction is a flattened array of all logs data within the transaction:
-
-   _`tx_logs_data = [number_of_logs, ...log_data_0, ...log_data_1, ...]`_
-
-   The format of _log_data_ varies based on the log type. For specifics, see the "Encoding" sections in [Unencrypted Log](#encoding-1), [Encrypted Log](#encoding-2), and [Encrypted Note Preimage](#encoding-3).
-
-2. The encoded logs data of a block is a flatten array of a collection of the above _tx_logs_data_, with hints facilitating hashing replay in a binary tree structure:
-
-   _`block_logs_data = [number_of_branches, number_of_transactions, ...tx_logs_data_0, ...tx_logs_data_1, ...]`_
-
-   - _number_of_transactions_ is the number of leaves in the left-most branch, restricted to either _1_ or _2_.
-   - _number_of_branches_ is the depth of the parent node of the left-most leaf.
-
-Here is a step-by-step example to construct the _`block_logs_data`_:
-
-1. A rollup, _R01_, merges two transactions: _tx0_ containing _tx_logs_data_0_, and _tx1_ containing _tx_logs_data_1_:
-
-   ```mermaid
-   flowchart BT
-       tx0((tx0))
-       tx1((tx1))
-       R01((R01))
-       tx0 --- R01
-       tx1 --- R01
-   ```
-
-   _block_logs_data_: _`[0, 2, ...tx_logs_data_0, ...tx_logs_data_1]`_
-
-   Where _0_ is the depth of the node _R01_, and _2_ is the number of aggregated _tx_logs_data_ of _R01_.
-
-2. Another rollup, _R23_, merges two transactions: _tx3_ containing _tx_logs_data_3_, and _tx2_ without any logs:
-
-   ```mermaid
-   flowchart BT
-       tx2((tx2))
-       tx3((tx3))
-       R23((R23))
-       tx2 -. no logs .- R23
-       tx3 --- R23
-   ```
-
-   _block_logs_data_: _`[0, 1, ...tx_logs_data_3]`_
-
-   Here, the number of aggregated _tx_logs_data_ is _1_.
-
-3. A rollup, _RA_, merges the two rollups _R01_ and _R23_:
-
-   ```mermaid
-   flowchart BT
-      tx0((tx0))
-      tx1((tx1))
-      R01((R01))
-      tx0 --- R01
-      tx1 --- R01
-      tx2((tx2))
-      tx3((tx3))
-      R23((R23))
-      tx2 -.- R23
-      tx3 --- R23
-      RA((RA))
-      R01 --- RA
-      R23 --- RA
-   ```
-
-   _block_logs_data_: _`[1, 2, ...tx_logs_data_0, ...tx_logs_data_1, 0, 1, ...tx_logs_data_3]`_
-
-   The result is the _block_logs_data_ of _R01_ concatenated with the _block_logs_data_ of _R23_, with the _number_of_branches_ of _R01_ incremented by _1_. The updated value of _number_of_branches_ (_0 + 1_) is also the depth of the node _R01_.
-
-4. A rollup, _RB_, merges the above rollup _RA_ and another rollup _R45_:
-
-   ```mermaid
-   flowchart BT
-     tx0((tx0))
-      tx1((tx1))
-      R01((R01))
-      tx0 --- R01
-      tx1 --- R01
-      tx2((tx2))
-      tx3((tx3))
-      R23((R23))
-      tx2 -.- R23
-      tx3 --- R23
-      RA((RA))
-      R01 --- RA
-      R23 --- RA
-      tx4((tx4))
-      tx5((tx5))
-      R45((R45))
-      tx4 --- R45
-      tx5 --- R45
-      RB((RB))
-      RA --- RB
-      R45 --- RB
-   ```
-
-   _block_logs_data_: _`[2, 2, ...tx_logs_data_0, ...tx_logs_data_1, 0, 1, ...tx_logs_data_3, 0, 2, ...tx_logs_data_4, ...tx_logs_data_5]`_
-
-   The result is the concatenation of the _block_logs_data_ from both rollups, with the _number_of_branches_ of the left-side rollup, _RA_, incremented by _1_.
-
-### Verification
-
-Upon receiving a proof and its encoded logs data, the entity can ensure the correctness of the provided _block_logs_data_ by verifying that the _accumulated_logs_hash_ in the proof can be derived from it:
-
-```js
-const accumulated_logs_hash = compute_accumulated_logs_hash(block_logs_data);
-assert(accumulated_logs_hash == proof.accumulated_logs_hash);
-assert(block_logs_data.accumulated_logs_length == proof.accumulated_logs_length);
-
-function compute_accumulated_logs_hash(logs_data) {
-  const number_of_branches = logs_data.read_u32();
-
-  const number_of_transactions = logs_data.read_u32();
-  let res = hash_tx_logs_data(logs_data);
-  if number_of_transactions == 2 {
-    res = hash(res, hash_tx_logs_data(logs_data));
-  }
-
-  for (let i = 0; i < number_of_branches; ++i) {
-    const res_right = compute_accumulated_logs_hash(logs_data);
-    res = hash(res, res_right);
-  }
-
-  return res;
-}
-
-function hash_tx_logs_data(logs_data) {
-  const number_of_logs = logs_data.read_u32();
-  let res = hash_log_data(logs_data);
-  for (let i = 1; i < number_of_logs; ++i) {
-    const log_hash = hash_log_data(logs_data);
-    res = hash(res, log_hash);
-  }
-  return res;
-}
-```
-
-The _accumulated_logs_length_ in _block_logs_data_ is computed during the processing of each _logs_data_ within _hash_log_data()_. The implementation of _hash_log_data_ varies depending on the type of the logs being processed. Refer to the "Verification" sections in [Unencrypted Log](#verification-1), [Encrypted Log](#verification-2), and [Encrypted Note Preimage](#verification-3) for details.
-
 ## Unencrypted Log
 
 Unencrypted logs are used to communicate public information out of smart contracts. They can be emitted from both public and private functions.

diff --git a/docs/docs/protocol-specs/rollup-circuits/index.md b/docs/docs/protocol-specs/rollup-circuits/index.md
@@ -19,9 +19,9 @@ Note that we have two different types of "merger" circuits, depending on what th
 For transactions we have:
 
 - The `merge` rollup
-  - Merges two `base` rollup proofs OR two `merge` rollup proofs
+  - Merges two rollup proofs of either `base` or `merge` and constructs outputs for further proving
 - The `root` rollup
-  - Merges two `merge` rollup proofs
+  - Merges two rollup proofs of either `base` or `merge` and constructs outputs for L1
 
 And for the message parity we have:
 
@@ -30,7 +30,7 @@ And for the message parity we have:
 - The `base_parity` circuit
   - Merges `N` l1 to l2 messages in a subtree
 
-In the diagram the size of the tree is limited for demonstration purposes, but a larger tree would have more layers of merge rollups proofs.
+In the diagram the size of the tree is limited for demonstration purposes, but a larger tree would have more layers of merge rollups proofs. Exactly how many layers and what combination of `base` and/or `merge` circuits are consumed is based on filling a [wonky tree](../state/tree-implementations.md#wonky-merkle-trees) with N transactions.
 Circles mark the different types of proofs, while squares mark the different circuit types.
 
 ```mermaid
@@ -465,7 +465,7 @@ Furthermore, the `OutHash` is a computed from a subset of the data in `TxsHash`
 
 Since we strive to minimize the compute requirements to prove blocks, we amortize the commitment cost across the full tree.
 We can do so by building merkle trees of partial "commitments", whose roots are ultimately computed in the final root rollup circuit.
-Below, we outline the `TxsHash` merkle tree that is based on the `TxEffect`s and a `OutHash` which is based on the `l2_to_l1_msgs` (cross-chain messages) for each transaction.
+Below, we outline the `TxsHash` merkle tree that is based on the `TxEffect`s and a `OutHash` which is based on the `l2_to_l1_msgs` (cross-chain messages) for each transaction, with four transactions in this rollup.
 While the `TxsHash` implicitly includes the `OutHash` we need it separately such that it can be passed to the `Outbox` for consumption by the portals with minimal work.
 
 ```mermaid
@@ -588,6 +588,8 @@ graph BT
 
 While the `TxsHash` merely require the data to be published and known to L1, the `InHash` and `OutHash` needs to be computable on L1 as well.
 This reason require them to be efficiently computable on L1 while still being non-horrible inside a snark - leading us to rely on SHA256.
+
+
 The L2 to L1 messages from each transaction form a variable height tree. In the diagram above, transactions 0 and 3 have four messages, so require a tree with two layers, whereas the others only have two messages and so require a single layer tree. The base rollup calculates the root of this tree and passes it as the to the next layer. Merge rollups simply hash both of these roots together and pass it up as the `OutHash`.
 
 ## Next Steps

diff --git a/docs/docs/protocol-specs/rollup-circuits/merge-rollup.md b/docs/docs/protocol-specs/rollup-circuits/merge-rollup.md
@@ -82,13 +82,12 @@ def MergeRollupCircuit(
 
     assert left.public_inputs.constants == right.public_inputs.constants
     assert left.public_inputs.end == right.public_inputs.start
-    assert left.public_inputs.type == right.public_inputs.type
-    assert left.public_inputs.height_in_block_tree == right.public_inputs.height_in_block_tree
+    assert left.public_inputs.num_txs >= right.public_inputs.num_txs
 
     return BaseOrMergeRollupPublicInputs(
         type=1,
-        height_in_block_tree=left.public_inputs.height_in_block_tree + 1,
-        txs_hash=SHA256(left.public_inputs.txs_hash | right.public_inputs.txs_hash),
+        num_txs=left.public_inputs.num_txs + right.public_inputs.num_txs,
+        txs_effect_hash=SHA256(left.public_inputs.txs_effect_hash | right.public_inputs.txs_effect_hash),
         out_hash=SHA256(left.public_inputs.out_hash | right.public_inputs.out_hash),
         start=left.public_inputs.start,
         end=right.public_inputs.end,

diff --git a/docs/docs/protocol-specs/rollup-circuits/root-rollup.md b/docs/docs/protocol-specs/rollup-circuits/root-rollup.md
@@ -183,8 +183,7 @@ def RootRollupCircuit(
 
     assert left.public_inputs.constants == right.public_inputs.constants
     assert left.public_inputs.end == right.public_inputs.start
-    assert left.public_inputs.type == right.public_inputs.type
-    assert left.public_inputs.height_in_block_tree == right.public_inputs.height_in_block_tree
+    assert left.public_inputs.num_txs >= right.public_inputs.num_txs
 
     assert parent.state.partial == left.public_inputs.start
 
@@ -208,8 +207,8 @@ def RootRollupCircuit(
     header = Header(
         last_archive = left.public_inputs.constants.last_archive,
         content_commitment: ContentCommitment(
-            tx_tree_height = left.public_inputs.height_in_block_tree + 1,
-            txs_hash = SHA256(left.public_inputs.txs_hash | right.public_inputs.txs_hash),
+            num_txs=left.public_inputs.num_txs + right.public_inputs.num_txs,
+            txs_effect_hash=SHA256(left.public_inputs.txs_effect_hash | right.public_inputs.txs_effect_hash),
             in_hash = l1_to_l2_roots.public_inputs.sha_root,
             out_hash = SHA256(left.public_inputs.out_hash | right.public_inputs.out_hash),
         ),

diff --git a/docs/docs/protocol-specs/state/tree-implementations.md b/docs/docs/protocol-specs/state/tree-implementations.md
@@ -8,6 +8,12 @@ In an append-only Merkle tree, new leaves are inserted in order from left to rig
 
 Append-only trees allow for more efficient syncing than sparse trees, since clients can sync from left to right starting with their last known value. Updates to the tree root, when inserting new leaves, can be computed from the rightmost "frontier" of the tree (i.e., from the sibling path of the rightmost nonzero leaf). Batch insertions can be computed with fewer hashes than in a sparse tree. The historical snapshots of append-only trees also enable efficient membership proofs; as older roots can be computed by completing the merkle path from a past left subtree with an empty right subtree.
 
+### Wonky Merkle Trees
+
+We also use a special type of append-only tree to structure the rollup circuits. Given `n` leaves, we fill from left to right and attempt to pair them to produce the next layer. If `n` is a power of 2, this tree looks exactly like a standard append-only merkle tree. Otherwise, once we reach an odd-sized row we shift the final node up until we reach another odd row to combine them.
+
+This results in an unbalanced tree where there are no empty leaves. For rollups, this means we don't have to pad empty transactions and process them through the rollup circuits. A full explanation is given [here](./wonky-tree.md).
+
 ## Indexed Merkle trees
 
 Indexed Merkle trees, introduced [here](https://eprint.iacr.org/2021/1263.pdf), allow for proofs of non-inclusion more efficiently than sparse Merkle trees. Each leaf in the tree is a tuple of: the leaf value, the next-highest value in the tree, and the index of the leaf where that next-highest value is stored. New leaves are inserted from left to right, as in the append-only tree, but existing leaves can be _modified_ to update the next-highest value and next-highest index (a.k.a. the "pointer") if a new leaf with a "closer value" is added to the tree. An Indexed Merkle trees behaves as a Merkle tree over a sorted linked list.