Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block lineage tracking #142

Open
enobayram opened this issue Apr 11, 2023 · 2 comments
Open

Block lineage tracking #142

enobayram opened this issue Apr 11, 2023 · 2 comments

Comments

@enobayram
Copy link
Contributor

The goal of this issue is to implement an efficient lineage tracking for chainweb-data. Currently, there's no efficient nor straightforward way to deduce lineage information about a block or any data that derives from a block. For example, if one wants to determine whether a block is an orphan or even worse, if one wants to jump from a continuation transaction back to its earlier steps that are in their blockchain history one would need to perform complex and expensive SQL queries.

@enobayram
Copy link
Contributor Author

Here's a suggestion for how we might implement lineage tracking in chainweb-data:

This approach relies on the typical "palm-tree" shape of the blockchain history. I.e. even though the full history of a blockchain has the shape of a tree, the tree is mostly linear with very short branches diverging from it throughout its history. In this approach, we'd have a new lineage table:

CREATE TABLE lineage (
  id BIGSERIAL PRIMARY KEY,
  chain_id INT NOT NULL,
  branch_height BIGINT NOT NULL,
  parent BIGINT
);

We'd also add a new block_lineage table:

CREATE TABLE block_lineage (
  hash TEXT UNIQUE NOT NULL,
  lineage BIGINT NOT NULL REFERENCES lineage(id)
);

In this approach, whenever we detect a branch in the block history we create a new lineage row and populate the block_lineage table with the hashes of any blocks that live on that lineage. Note that the parent column of lineage is nullable, and the reason is that we treat the "mainline" lineage specially as the NULL lineage. But also note that block_lineage's lineage column is not nullable, because we don't intend to store mainline blocks there, keeping the size of that table low, since orphans are infrequent in the blockchain.

With these tables in place, one can efficiently perform lineage queries by checking the lineage of a given hash and then acquiring a list of parent lineages along with their branch_heights. Then any block hash h that satisfies the condition following conditions is in its lineage:

  • lineage(h) is a member of the lineages list
  • The block height is below branch_height for that lineage

Note that when we first encounter a block, we don't know whether it's an orphan or not. That fact may not even have been determined by the network yet in the first place. So that means we'll need to extend chainweb-data with the functionality of continuously keeping the block_lineage table in line with the best estimate of which branch is mainline and which is an orphan. So, whenever chainweb-data detects that a lineage we originally thought to be mainline is actually an orphan and a previous orphan l is now the mainline, it performs the following changes:

  • Delete all block_lineage rows with lineage = l
  • Add all block hashes with height larger than branch_height of l that are not in block_lineage with lineage l.
  • For all lineage rows with parent = l, set the lineage to NULL
  • For all lineage rows with parent IS NULL and branch_height > branch_height_of_l set their parent to l.

@enobayram
Copy link
Contributor Author

Since most orphan branches are very short-lived, we could also consider a simpler approach where we just add an is_mainline BOOLEAN column to blocks and let chainweb-data keep it updated as the mainline/orphan status of branches are revealed.

Due to the short length of orphan branches, analyses similar to my previous suggestion should still be efficient. If one is interested in figuring out whether hash_early is an ancestor of hash_late, one can perform the following logic:

  • If hash_late has is_mainline = TRUE, then hash_early also needs to have is_mainline = TRUE
  • If hash_late has is_mainline = FALSE then gather the list ancestors of hash_late that also have is_mainline = FALSE and check whether:
    • hash_early is in ancestors
    • OR hash_early has is_mainline = TRUE and heightofhash_earlyis less thanancestors`

This should almost certainly be efficient, since we expect ancestors to be fairly short.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant