-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoints #99
Checkpoints #99
Conversation
9a7e4a0
to
a9aa694
Compare
bc733ca
to
f0f1ace
Compare
I've been running fingertip with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! This will be a HUGE improvement to hnsd. I will do some more testing if I get the chance but lgtm so far.
// Due to checkpoint initialization | ||
// we may not have any headers from here | ||
// down to genesis | ||
if (!hdr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm my understanding for the modified behavior of the chain locator, a locator for a chain with checkpoint height 697216 would look like this (based on #69):
[
697216, 697215, 697214, 697213,
697212, 697211, 697210, 697209,
697208, 697207, 697206, 697205,
697203, 697199, 697191, 697175,
697143, 697079, 696951, 696695,
696183, 695159, 693111, 689015,
680823, 664439, 631671, 566135,
435063, 172919, 0
]
Since we only store 150 block headers up to checkpoint height, the new chain locator would look like this (as it would only include hashes from the last 150 headers)?
[
697216, 697215, 697214, 697213,
697212, 697211, 697210, 697209,
697208, 697207, 697206, 697205,
697203, 697199, 697191, 697175,
697143, 697079, 0
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test that illustrates this now. After syncing 1000 headers, hnsd saves a checkpoint to disk containing headers 800-949. When hnsd restarts the locator it sends to peers is exactly this:
assert.deepStrictEqual(
heights,
[
949, // tip
// 10 prev blocks
948, 947, 946, 945, 944, 943, 942, 941, 940, 939,
938, // -1
936, // -2
932, // -4
924, // -8
908, // -16
876, // -32
812, // -64
0 // hnsd doesn't have any blocks lower than 800, so skip to genesis
]
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I also just added some more test coverage demonstrating that hnsd can recover, after a restart, from a reorg that forked before, in the middle of, or after the current checkpoint window. The worst case scenario (reorg fork before checkpoint window) results in sync from genesis.
FILE *file = fopen(tmp, "w"); | ||
if (!file) | ||
return false; | ||
size_t written = fwrite(&buf, 1, HSK_STORE_CHECKPOINT_SIZE, file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future improvement/Nit: it might be worth checking out libuv's uv_fs_*
file system functions since it's already a dependency. This way writes won't block the main libuv event loop because they will be performed in a thread pool. Not really an issue at all because we only do this every 2000 blocks.
73774b7
to
5c925c4
Compare
89e176b
to
aba79e0
Compare
More of an fyi than a problem, the prefix path not existing is fatal (whereas hsd creates the directory). What do you think about creating the folder if it doesn't exist?
If only |
@rithvikvibhu I'll add this in a future PR. We'll need to implement our own "create directories recursively if they don't yet exist" that works on all platforms. Because this is C and we don't get anything for free! |
aba79e0
to
e7004f4
Compare
This baby just went through a rigorous rebase. @nodech and @buffrr I explored using libuv for async file i/o and it was too problematic, especially during IBD if we write (and re-write) checkpoint files too quickly. The async becomes a problem because the file writes overlap and need to be atomic, meaning we need a mutex or just... write synchronously. I did clean up a lot of code during that exploration however, and this pull request is now refactored and should be relatively easy to review commit-by-commit. |
e7004f4
to
86cd621
Compare
src/daemon.c
Outdated
} | ||
|
||
// Prefix must have enough room for filename | ||
if (strlen(opt->prefix) + 32 >= HSK_STORE_PATH_MAX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use const for 32 as well or hsk_store_filename_len(char *prefix, uint32_t height)
. Define is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in d42ff69
if (!write_u8(&data, HSK_STORE_VERSION)) | ||
goto fail; | ||
|
||
uint32_t height = chain->height - HSK_STORE_CHECKPOINT_WINDOW; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add assert(chain->height % HSK_STORE_CHECKPOINT_WINDOW)
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in d42ff69
f3485b6
to
d42ff69
Compare
What's the logic behind this number? Should it be regenerated from time to time? When it was first commited in 0f5ef70 the difference with that moment's height was less than 11k blocks (around 2.5 months). Now it's more than 45k blocks (roughly 8 months). |
The checkpoints in hnsd matches hsd (which gives ~4 months of PoW). It's part of hsd's release process to update checkpoints for every (major) release. hnsd uses the latest checkpoints from hsd, hnsd v2.0.0 was released on Jan 4th this year. Also fyi, this hard-coded checkpoint only affects initial sync. When using |
@rithvikvibhu thanks for the answer.
Yes, this is clear. Still it adds some extra time for the first time that would be nice to decrease. A more recent checkpoint would help. A minor issue. |
Closes #3
Closes #69
Background
hnsd is a light name resolver which is a type of light client with NO wallet functions. Since all it does is resolve names, it does NOT require the entire headers chain. In fact, to resolve names all hnsd needs is the
treeRoot
in the header of the chain tip. Like all light clients however, hnsd does require SOME historical consensus data to validate new block headers and stay in sync:With this in mind, we introduce a new object called a
checkpoint
which provides enough data for hnsd to effectively begin syncing the chain from ANY arbitrary, non-genesis block. Acheckpoint
contains the following data:n
n - 1
n
, serialized in wire formatWith this mechanism in place we can now do some totally badass new things with hnsd:
checkpoint
.checkpoint
to disk during initial chain sync and normal runtimeOverall, this will speed up chain sync after restarts and even for brand new installations tremendously. It will reduce memory usage by hnsd. It is the first time hnsd has ever written anything to disk, and the size of the data written to disk will remain constant (at, like, 36kB).
Changes
hnsd daemon now accepts two new options:
-t
,--checkpoint
(no argument): Begin chain sync at hard-coded checkpoint height of 136,000-x <directory path>
,--prefix <directory path>
: Specify the location on disk to read & write checkpointsprefix
will overridecheckpoint
if both are used. Recommended usage will be:hnsd -c -x ~/.hnsd
When
prefix
is set, hnsd will:a. Every 2000 blocks defines a "checkpoint window"
b. At the end of every window, a checkpoint is generated from the beginning of that window
Example: At height
8000
, hnsd will store a checkpoint including block headers6000-6149
. If hnsd is immediately restarted, it will begin syncing from the network with block6150
. Note that by the time this happens, block6150
will already have had at least 1,850 confirmations (almost 2 weeks)! This is our safety buffer. A 2000-block reorg would mean the Handshake network is already pretty fucked. If this ever happens, hnsd will "naturally" resync from the genesis block. This is because the genesis block is always included in its chain locator.Future work
We can also prune from memory any headers that precede the most recently saved checkpoint. This will limit hnsd memory usage and keep it lean, not included in this PR.