Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoints #99

Merged
merged 13 commits into from
Dec 12, 2022
Merged

Checkpoints #99

merged 13 commits into from
Dec 12, 2022

Conversation

pinheadmz
Copy link
Member

@pinheadmz pinheadmz commented Sep 30, 2022

Closes #3
Closes #69

Background

hnsd is a light name resolver which is a type of light client with NO wallet functions. Since all it does is resolve names, it does NOT require the entire headers chain. In fact, to resolve names all hnsd needs is the treeRoot in the header of the chain tip. Like all light clients however, hnsd does require SOME historical consensus data to validate new block headers and stay in sync:

  1. The ONE previous block header to ensure new headers are in a chain
  2. The ELEVEN previous block headers to compute median time past
  3. The headers from 144 +/-2 blocks ago to compute difficulty target

With this in mind, we introduce a new object called a checkpoint which provides enough data for hnsd to effectively begin syncing the chain from ANY arbitrary, non-genesis block. A checkpoint contains the following data:

  1. Start height n
  2. The total accumulated work in the blockchain including block at height n - 1
  3. 150 block headers starting with height n, serialized in wire format

With this mechanism in place we can now do some totally badass new things with hnsd:

  1. Always start hnsd sync from a specific height (e.g. 136,000) using a hard-coded checkpoint.
  2. Write our own checkpoint to disk during initial chain sync and normal runtime
  3. Recover sync status from this saved checkpoint

Overall, this will speed up chain sync after restarts and even for brand new installations tremendously. It will reduce memory usage by hnsd. It is the first time hnsd has ever written anything to disk, and the size of the data written to disk will remain constant (at, like, 36kB).

Changes

hnsd daemon now accepts two new options:

-t, --checkpoint (no argument): Begin chain sync at hard-coded checkpoint height of 136,000

-x <directory path>, --prefix <directory path>: Specify the location on disk to read & write checkpoints

prefix will override checkpoint if both are used. Recommended usage will be: hnsd -c -x ~/.hnsd

When prefix is set, hnsd will:

  1. Initialize the chain on boot from the saved checkpoint if it exists
  2. Save a new checkpoint to disk every 2000 blocks:
    a. Every 2000 blocks defines a "checkpoint window"
    b. At the end of every window, a checkpoint is generated from the beginning of that window

Example: At height 8000, hnsd will store a checkpoint including block headers 6000-6149. If hnsd is immediately restarted, it will begin syncing from the network with block 6150. Note that by the time this happens, block 6150 will already have had at least 1,850 confirmations (almost 2 weeks)! This is our safety buffer. A 2000-block reorg would mean the Handshake network is already pretty fucked. If this ever happens, hnsd will "naturally" resync from the genesis block. This is because the genesis block is always included in its chain locator.

Future work

We can also prune from memory any headers that precede the most recently saved checkpoint. This will limit hnsd memory usage and keep it lean, not included in this PR.

@pinheadmz pinheadmz force-pushed the checkpoints branch 2 times, most recently from 9a7e4a0 to a9aa694 Compare October 3, 2022 12:58
@pinheadmz pinheadmz mentioned this pull request Oct 3, 2022
@pinheadmz pinheadmz marked this pull request as ready for review October 3, 2022 14:11
@pinheadmz pinheadmz force-pushed the checkpoints branch 2 times, most recently from bc733ca to f0f1ace Compare October 4, 2022 17:51
@rithvikvibhu
Copy link
Member

I've been running fingertip with hnsd -t ... for a few days and everything's good so far. No crashes or problems visiting sites, hnsd syncs < 10 seconds (this is bliss).

Copy link
Contributor

@buffrr buffrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This will be a HUGE improvement to hnsd. I will do some more testing if I get the chance but lgtm so far.

// Due to checkpoint initialization
// we may not have any headers from here
// down to genesis
if (!hdr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm my understanding for the modified behavior of the chain locator, a locator for a chain with checkpoint height 697216 would look like this (based on #69):

[
  697216, 697215, 697214, 697213,
  697212, 697211, 697210, 697209,
  697208, 697207, 697206, 697205,
  697203, 697199, 697191, 697175,
  697143, 697079, 696951, 696695,
  696183, 695159, 693111, 689015,
  680823, 664439, 631671, 566135,
  435063, 172919,      0
] 

Since we only store 150 block headers up to checkpoint height, the new chain locator would look like this (as it would only include hashes from the last 150 headers)?

[
  697216, 697215, 697214, 697213,
  697212, 697211, 697210, 697209,
  697208, 697207, 697206, 697205,
  697203, 697199, 697191, 697175,
  697143, 697079,  0
] 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that illustrates this now. After syncing 1000 headers, hnsd saves a checkpoint to disk containing headers 800-949. When hnsd restarts the locator it sends to peers is exactly this:

    assert.deepStrictEqual(
      heights,
      [
        949, // tip
             // 10 prev blocks
        948, 947, 946, 945, 944, 943, 942, 941, 940, 939,
        938, // -1
        936, // -2
        932, // -4
        924, // -8
        908, // -16
        876, // -32
        812, // -64
        0    // hnsd doesn't have any blocks lower than 800, so skip to genesis
      ]
    );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I also just added some more test coverage demonstrating that hnsd can recover, after a restart, from a reorg that forked before, in the middle of, or after the current checkpoint window. The worst case scenario (reorg fork before checkpoint window) results in sync from genesis.

FILE *file = fopen(tmp, "w");
if (!file)
return false;
size_t written = fwrite(&buf, 1, HSK_STORE_CHECKPOINT_SIZE, file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future improvement/Nit: it might be worth checking out libuv's uv_fs_* file system functions since it's already a dependency. This way writes won't block the main libuv event loop because they will be performed in a thread pool. Not really an issue at all because we only do this every 2000 blocks.

@pinheadmz pinheadmz added this to the v2.0.0 milestone Oct 24, 2022
@pinheadmz pinheadmz force-pushed the checkpoints branch 5 times, most recently from 73774b7 to 5c925c4 Compare October 26, 2022 17:15
@pinheadmz pinheadmz force-pushed the checkpoints branch 3 times, most recently from 89e176b to aba79e0 Compare November 3, 2022 17:45
@rithvikvibhu
Copy link
Member

More of an fyi than a problem, the prefix path not existing is fatal (whereas hsd creates the directory). What do you think about creating the folder if it doesn't exist?


-x , --prefix : Specify the location on disk to read & write checkpoints
...Recommended usage will be: hnsd -c -x ~/.hnsd

If only -x is specified without a value, how about defaulting to ~/.hnsd (like hsd does to ~/.hsd)? But np if no, I understand getting $HOME on all platforms is a pita

@pinheadmz
Copy link
Member Author

More of an fyi than a problem, the prefix path not existing is fatal (whereas hsd creates the directory). What do you think about creating the folder if it doesn't exist?

@rithvikvibhu I'll add this in a future PR. We'll need to implement our own "create directories recursively if they don't yet exist" that works on all platforms. Because this is C and we don't get anything for free!

@pinheadmz
Copy link
Member Author

This baby just went through a rigorous rebase. @nodech and @buffrr I explored using libuv for async file i/o and it was too problematic, especially during IBD if we write (and re-write) checkpoint files too quickly. The async becomes a problem because the file writes overlap and need to be atomic, meaning we need a mutex or just... write synchronously. I did clean up a lot of code during that exploration however, and this pull request is now refactored and should be relatively easy to review commit-by-commit.

src/daemon.c Outdated
}

// Prefix must have enough room for filename
if (strlen(opt->prefix) + 32 >= HSK_STORE_PATH_MAX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well use const for 32 as well or hsk_store_filename_len(char *prefix, uint32_t height). Define is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in d42ff69

if (!write_u8(&data, HSK_STORE_VERSION))
goto fail;

uint32_t height = chain->height - HSK_STORE_CHECKPOINT_WINDOW;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add assert(chain->height % HSK_STORE_CHECKPOINT_WINDOW) here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in d42ff69

@handshake-enthusiast
Copy link

  1. Always start hnsd sync from a specific height (e.g. 136,000) using a hard-coded checkpoint.

What's the logic behind this number? Should it be regenerated from time to time?

When it was first commited in 0f5ef70 the difference with that moment's height was less than 11k blocks (around 2.5 months). Now it's more than 45k blocks (roughly 8 months).

@rithvikvibhu
Copy link
Member

The checkpoints in hnsd matches hsd (which gives ~4 months of PoW). It's part of hsd's release process to update checkpoints for every (major) release. hnsd uses the latest checkpoints from hsd, hnsd v2.0.0 was released on Jan 4th this year.

Also fyi, this hard-coded checkpoint only affects initial sync. When using -x, hnsd only syncs continuing from last runs. So restarting hnsd would mean almost instant sync (it might re-fetch the last few ones again).

@handshake-enthusiast
Copy link

@rithvikvibhu thanks for the answer.

The checkpoints in hnsd matches hsd
hnsd uses the latest checkpoints from hsd

130000 != 136000

this hard-coded checkpoint only affects initial sync..

Yes, this is clear. Still it adds some extra time for the first time that would be nice to decrease. A more recent checkpoint would help. A minor issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize header chain sync using minimal disk space Persistent Headers
5 participants