Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[consensus] Enable Urkel Tree compaction #669

Merged
merged 23 commits into from
Jun 3, 2022

Conversation

pinheadmz
Copy link
Member

@pinheadmz pinheadmz commented Dec 9, 2021

Closes #660

Adds new rpc method compacttree which deletes from disk an enormous amount of historical data in the Urkel Tree that is only necessary in the case that a chain reorganization occurs deeper than ~288 blocks. Note that this condition also applies to blockchain-pruning nodes and in the event of such a deep reorg, a pruned node will fail into an unrecoverable state. Pruning is available as a configuration setting or can be executed while running with rpc pruneblockchain.

What does compacting the Urkel Tree mean?

Since the tree is append-only, when an existing name is updated (anything from a REVEAL to a REGISTER to an UPDATE) the old data still remains on disk. A new tree node is written and some file pointers are shuffled around to determine the new root hash of the tree (which is committed to in a block header).

If a chain reorg crosses back over a tree interval (36 blocks on mainnet) then an old tree root is pulled from the unzipped chain and used to revert the entire Urkel Tree to its old state, which is possible since all the historical data and pointers are still there on disk.

To compact the Urkel Tree means to take a given tree root and write a fresh Urkel Tree (in flat-file format on disk) containing ONLY the data connected directly to the root, then rename that directory to ~/.hsd/tree, wiping out all the historical data that is no longer part of the current state.

Strategy in this PR

Since compacting means the tree can no longer be reverted, it means that a chain reorg only one block deep that crosses a tree interval will permanently annihilate the full node. Therefore, what we must do is:

  1. Use the historical data we still have on disk to rewind the Urkel Tree to an old state (~288 blocks ago / about 7-8 tree intervals ago)
  2. Run the compacting process on that historical tree root, creating an unrecoverable reorg boundary two days in the past
  3. Replay the blockchain back in to the Urkel Tree, all ~288 blocks, including all namestate updates crossing back over those 7-8 tree commitments until the Urkel Tree is back in sync with the chain

End result

Whether or not a full node is pruning the blockchain, executing this RPC will apply the same restriction at the moment it is completed: a 288-block reorg is the end of the game. Of course UNLIKE a pruning node, this compacting process is NOT on-going, so over time the Urkel Tree will bloat back up again (for better or for worse) until the RPC is called again.

Mainnet testing

I created a second branch of this PR with WIP rpc dumpzone applied. I have hsd synced to height 97555 already and started it in an isolated mode:

hsd --no-dns --no-wallet --only=127.0.0.1

Then executed hsd-rpc dumpzone hns-97555-full.zone to get a snapshot of the Urkel Tree state.

Then executed hsd-rpc compacttree to run the pruning process.

The size of the directory ~/.hsd/tree dropped from 6.8 GB down to ~500 MB 🥳 and the process took about 2.5 minutes total 🎉

[I:2021-12-09T00:46:06Z] (net) Removed loader peer (127.0.0.1:12038).
[D:2021-12-09T00:46:06Z] (node-rpc) Handling RPC call: compacttree.
[D:2021-12-09T00:46:06Z] (node-rpc) []
[I:2021-12-09T00:46:06Z] (chain) Compacting Urkel Tree...
...
[I:2021-12-09T00:48:13Z] (chain) Synchronizing Tree with block history...
...
[I:2021-12-09T00:48:32Z] (chain) Synchronized Tree Root: 9d8eefb2af6c7ab640826f245040c693738ad7cd629b86f32ba040b3ad7a5f8c.

Finally, I ran hsd-rpc dumpzone hns-97555-compacted.zone and compared the two output zone files which matched exactly 🚀

TODO:

  • add tests
  • consider edge cases like executing too frequently (?)
  • determine best deployment (should it be automatic?)

@coveralls
Copy link

coveralls commented Dec 9, 2021

Coverage Status

Coverage increased (+0.2%) to 66.97% when pulling 74cc006 on pinheadmz:prunetree into a53f877 on handshake-org:master.

@pinheadmz
Copy link
Member Author

Maybe include #675

@pinheadmz
Copy link
Member Author

I'm marking this as ready for review now by solving the last problem:

What is the best method for deployment?

There are now two ways to do this:

  1. hsd-rpc compacttree can be run at any time while the node is alive but is not recommended because the node is locked for several minutes and this is not great for your peers (who may even timeout if you take too long)

  2. hsd --compact-tree this argument can be added at launch on command line or left in the config file and the node will run the compact process when it launches, before the node connects to the network.

These options give node runners and integrated apps like Bob Wallet the flexibility to compact the tree either automatically or manually, either way with the node operator knowing when it will happen. I think this is a better method then setting some disk size threshold, etc which will run the compactor at unexpected times.

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link

@brandondees brandondees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Didn't read through all the tests extremely carefully but they do seem like good starting coverage and documentation overall.

Might be nice to add a little more log reporting about how well it's working when applied. Might help node operators with health monitoring or to detect efficiency regressions etc.

@brandondees
Copy link

This looks like a great improvement, and I don't see any issues with the diff as written (take that with a grain of salt due to my unfamiliarity with the overall codebase and code conventions).

Another tangential question came to mind while reviewing: Are tree updates computed incrementally per block or is it an all-at-once tree update process once every interval? Is that something to consider further optimizations on in either direction?

@pinheadmz
Copy link
Member Author

Another tangential question came to mind while reviewing: Are tree updates computed incrementally per block or is it an all-at-once tree update process once every interval? Is that something to consider further optimizations on in either direction?

Changes to the Urkel Tree are stored in memory between tree intervals and written to disk every 36 blocks. This is an optimization and described in the white paper. If you shut down your node between tree intervals the data on disk will actually be behind. That's why we call syncTree() on startup to refill the tree data in memory from the block data on disk.

@brandondees
Copy link

Gotcha, so there's already not much else to be done there unless you wanted to have hyper-fine configuration of what goes into memory vs disk at what times.

@pinheadmz
Copy link
Member Author

pinheadmz commented Jan 12, 2022

Worth pointing out that 0ddd228 makes this PR NOT backwards compatible. After running on my own local hsd node and compacting the tree, I then switched back to master branch to work on something else and got Database Inconsistency error. We can sortof somewhat alleviate this by reverting this:

- this.put(layout.s.encode(), this.tree.rootHash());

so we would still write the treeRoot to the DB even though we no longer use it. I think this would alleviate some cases but we should still consider this a major breaking change.

@pinheadmz pinheadmz added this to the 4.0.0 milestone Mar 31, 2022
@pinheadmz
Copy link
Member Author

pinheadmz commented Mar 31, 2022

TODO:

  • remove RPC command

@pinheadmz
Copy link
Member Author

@nodech I removed the RPC compacttree and cleaned up the tests.

@nodech
Copy link
Contributor

nodech commented Apr 11, 2022

General

The first thing I was thinking about is, whether we are still Full Node after
this. Do we want to only support this with Pruned nodes? We could of course
have two options for this pruned blocks and pruned chain. Even if for full node
it is possible to unprune the node by reindexing the chian.

So I would say, initially support this for prunes only and if someone
knows what they are doing they could use --force-tree-compact that can
be used for full nodes as well?

I think, it's better to support --force-tree-compact for full node
compaction, only after when we have ability to reindex the tree.

Atomicity and layout.s

The chaindb and tree are separate trees and we need to have them synchronized
logically. Only way to control where we are at in the Tree Hash history is
the inject. Last tree root hash is not useful information by itself. It needs
to come from the chaindb.

Issues layout.s solves comes when we crash during connect/disonnect.
layout.s is the only way we synchronize right now. Note, it's not useful
to use tip block hash, because tree is only committed once in every 36
blocks, unless you go to the last commit height and grab that block.
Let's go through issues that can occur and how layout.s solves.

Let's consider connect failing. What connect does is: first it commits
data to the Tree and then we atomically add all changes to the chaindb
and also write layout.s as new tree root. If for some reason node
crashes (or gets shut down) between Tree.commit and chaindb.save
we just end up with the "extra" data in the tree, but there will be
no inconsistency when node is rerun. Because layout.s will properly
inject previous Tree.hash instead of the last one (that one was
not saved in chaindb because of failure) and try to connect the same
block again. Because writes to the chaindb.leveldb are atomic,
once we have written succesfully, we are assured there's no inconsistency
between tree and the chain. It's also good to note, that this would
not work in chaindb.save -> tree.commit order, because in case of
failure we would not have tree data, even though chaindb would think
it's time to move on (on next rerun).

This problem becomes more exaggerated when you consider reset or reorg.
When reorg happens, chaindb will revert its state to the past and then move
forward to the new chain. If node crashes/gets shut down after it's reverted to
the past, layout.s is the only way to tell tree that we are actualy in the
older tree and latest tree.rootHash is not relevant.

If you consider someone restarting node with --compact-tree after reverting
to the past (on reorg) you can easily see, why should compact tree also depend
ony laoyut.s instead of tree.rootHash.

So to sum up, tree.rootHash is never relevant for atomicity/consistency
between chaindb and tree. We always trust chaindb to provide the tree.rootHash.
chaindb root hash can be behind tree root hash, but NEVER ahead.

Another note: The issue with blockstore was similar in a way, but it would
the node not starting and would fail to sync. This issue on the other hand,
wont get noticed because it wont result to crash. It will just have wrong
information in the tree.

Atomicity with the compaction.

The issue is related to above mentioned chaindb rootHash(layout.s).
We need to only rely on chaindb to give us information about tree situation.

In order to keep compacting code working with layout.s, we need to
move that layout.s pointer to the past FIRST and then compact. In this
case we make sure, if compaction fails, syncTree will just redo the same
thing and in the worst case end up with wasted space. We can't compact first,
because if it succeeds and then chaindb write fails, chaindb will be AHEAD.

syncTree needs to use layout.s instead as well. As mentioned above,
tree hash root should never be used for anything related to the chaindb.
Because of the syncTree nature, it's okay even if compaction failed
and we restart node without compact flag nd chaindb is behind. syncTree
will update chaindb from saveNames until it's caught up.

We can continue above mentioned example to show the issue of not using
layout.s. If chaindb reorg/reset went in the past, but crashed/closed
syncTree will actually be AHEAD and will continue appending to the wrong
tree hash.

To summurize

  • chaindb root hash can be behind tree root hash, but NEVER ahead.
  • within compaction we need to move chain root hash(layout.s) behind even
    if we fail to compact and never run it again.
  • syncTree should use chain root hash (layout.s)

Nits

You can move out syncTree from compactTree and have it like this.

  if (!this.options.spv) {
    if (this.options.compactTree)
      await this.compactTree();
      
    await this.syncTree();
  }

a

@pinheadmz
Copy link
Member Author

You can move out syncTree from compactTree and have it like this.

This will call syncTree() an unnecessary second time if the option is set. (compactTree() calls syncTree() automatically already)

But I fixed that with an else in d656335

@pinheadmz
Copy link
Member Author

@nodech thank you so much for your brilliant review and great catch regarding layout.s I rebased the branch to revert the layout.s deprecation and added in an extra commit to that value in compactTree() that is in commit 6cfd96b

I also added a test in commit 80baf34 to cover the "failed connect()" case as you describe. You were right, the code totally failed that test until layout.s was restored.

@pinheadmz
Copy link
Member Author

Mainnet integration testing is going well, working with prune and full nodes and combinations of the new options. Urkel reliably drops from 15GB to ~1GB. Even synced a fresh mainnet chain and restarted every 1000 blocks to keep tree size low throughout entire IBD.

Only issue so far is the migration in SPV mode

@smcki012
Copy link

smcki012 commented Jun 1, 2022

Fantastic work on this. Benchmarks are very promising. HSD is getting really lean for node operators!

@nodech nodech force-pushed the prunetree branch 2 times, most recently from 4fe79bc to 3e570fc Compare June 2, 2022 19:09
@nodech nodech added advanced review difficulty - advanced blockchain part of the codebase dns part of the codebase protocol/consensus part of the codebase breaking-major Backwards incompatible - Release version feature general - adding feature labels Jun 3, 2022
@pinheadmz
Copy link
Member Author

ACK 6e3a113

Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

ACK 6e3a113b8cf038a0554aa3cca5e485d196f4e0be
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmKaKMQACgkQ5+KYS2KJ
yTryyg//ax9DOb5unr1CNhwCNLaUDA0hS01tvzMRnrEYEmcWtyOC6eSFlnLnuO/V
/9wUYaoGyeJBBZx6QKW//zjUv8D0Xyc/46wtDdqmG9Oo0N29CUMTSjJKkKsjRLWG
mQNWCG0ai/wilp/x+p32usQYQLeD6p58DkkUhbXugKeJgcSKC98VOXj3o1Z4/r6K
z3glTLRoltihWFsVTVp648EC3+QxjiAxn4pOXJxyIPwOm6MyZSubmF3pVVw4Cybm
G5U2+rtTgXyPY0BFC5t74SYSRLccmur40F/f+fH5xSw8TYYd24dkwLuwMup01z3r
OkHV78XMaQ8IfuIlTjImWQmEHPUCsnJ10nWaATa6HN1kOqp8TzV05S8CNVKOJOW9
7lf9D0MTw3NHSiFDKoilmWm+V4gFWuEx6qHiVL8VOI6QOsDnYYbPBVwPMxyT5GZJ
blTw8CyGWRROyEJ1L69k3UUMZ32u7lLt+G3guc0Vj87xqmc0XLhYG/LI+GxOsath
0fh4EQZxdMaWAVukZgdUaHAea0zRPWVG2Kj1OwDxXgFSVIzcLhMKJEYYsd1zLm88
IJ2DdIoaF3yEFdZedrfSVE1wDsso+QUFITu1KQD6iCiW/Gj533p5vFUFI55xO66z
+BMgLu1wjf6h8Yj1oUX86hNI8lKlfVrIcdlgQ92Y7ZCYlCpp8/E=
=qIZu
-----END PGP SIGNATURE-----

pinheadmz's public key is on keybase

@pinheadmz
Copy link
Member Author

ACK 74cc006

Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

ACK 74cc006ef089208a8a715746c9bed470a24a0b70
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmKaLQIACgkQ5+KYS2KJ
yToirxAAypbhBjbCYVgyqCg4y80e46mC/hTueIUOxZHqMYctKRxFhFiXmBn3dk9X
0X0RyzwWQ1IOv1L+CrrPWoqITVBlRaq/pEhvKyzkIvHlbul69ekZsWpVH6iPjl4c
joSK0MeNXjKJJKO+I7JH6fShR/QLY8RNGfcO1jo2Ce70rg7+sw3mbf8ikp4sBfgr
YeATGgLuSysGeluSfrYS+bfn+KwdwCxp58b2/toDZkkvLjtHqd4t/zu9HG+0rSg3
pYM3HFbEb8+O7H8HhHjJEFraCduRFar6SgDRH2UP/dcWMxS739zFOBbK673Cw/6e
vyIUm0lNXYEIERxAhJqwtDn9885rnQwMJTlxem7odCDQYizGtQb/0Ym6Sq2RKpfD
+qAWRAhxh1TcJln/K6J2T87e7u1PZp+dHM0GP3+4Bbld5xQU/DU/tllYWtDM7btK
nNegA4f0NfDVmamL3QdG35cWm9z9vfA3K7c0AeS5aenpV84ezCpBpa7aBlQ1M1V/
y6WO0qa0DyHdF4oau75WeYFB9wpTF7RLpgwGPQVdhWePsicMX8W9AoIGOzvLbnk0
nhBxGR+7a/hayD0ovGZKmM4VXeBFqDGnc+hmgFj/quEAeX+5MoHDSjgGFxeKOsz0
xRzcg0ROxXpfwubkk3jrQW4bm4ZugQ0zUPBuSwvIH98+lp5/Qmw=
=dtxM
-----END PGP SIGNATURE-----

pinheadmz's public key is on keybase

@pinheadmz pinheadmz merged commit ba949f3 into handshake-org:master Jun 3, 2022
@handshake-org handshake-org deleted a comment from satoshi-MAM Jan 10, 2023
@nodech nodech deleted the prunetree branch June 11, 2024 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
advanced review difficulty - advanced blockchain part of the codebase breaking-major Backwards incompatible - Release version dns part of the codebase feature general - adding feature protocol/consensus part of the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

syncTree() could use one more log message
5 participants