Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove a few hashes from the block header in favor of merkelizing them #7860

Closed
ValarDragon opened this issue Oct 2, 2018 · 7 comments
Closed
Labels
C:core S:proposal Status: Proposal stale for use by stalebot T:enhancement Type: Enhancement

Comments

@ValarDragon
Copy link
Contributor

Note, we probably no longer have time to do this prelaunch, but (unless I'm missing something), I think it should be done next BlockVersion upgrade. (Hopefully soon)

Currently the BlockHeader contains many Hashes. (See https://github.com/tendermint/tendermint/blob/master/types/block.go#L258) Its unclear to me why most of these are in here, as it gets merkelized anyway. The importance for lite clients comes in only if there is a data availability problem or a knowledge of change problem AFAICT. Otherwise, your just requesting a single additional node internal node in a merkle proof only on the blocks you wish to query. (Saving bandwidth for everyone, including the consensus engine)

The hashes in the blockheader which I want to comment on are:

	DataHash       cmn.HexBytes `json:"data_hash"`        // transactions
	ValidatorsHash     cmn.HexBytes `json:"validators_hash"`      // validators for the current block
	NextValidatorsHash cmn.HexBytes `json:"next_validators_hash"` // validators for the next block
	ConsensusHash      cmn.HexBytes `json:"consensus_hash"`       // consensus params for current block
	AppHash            cmn.HexBytes `json:"app_hash"`             // state after txs from the previous block
	LastResultsHash    cmn.HexBytes `json:"last_results_hash"`    // root hash of all results from the txs 
	EvidenceHash    cmn.HexBytes `json:"evidence_hash"`    // evidence included in the block
	ProposerAddress Address      `json:"proposer_address"` // original proposer of the block

Below is what I think it should be:

	BlockHash       cmn.HexBytes `json:"data_hash"`        // Merkle tree of DataHash, Evidence Hash, ConsensusHash, LastResultsHash
	AppHash            cmn.HexBytes `json:"app_hash"`             // state after txs from the previous block
	NextValidatorsHash cmn.HexBytes `json:"next_validators_hash"` // validators for the next block
	ProposerAddress Address      `json:"proposer_address"` // original proposer of the block
        NumEvidence      int8   // Number of evidence in this block
  • The reason for removal of ValidatorsHash is due to it being the previous block's NextValidatorsHash. (Ref App centric interpretation of concepts #2483)
  • Lite Clients will most often not be querying the inclusion of a tx by tx hash. Hence the data hash does not need to be in the top layer, and was thus relegated to BlockHash. (We want them querying by tag)
  • ConsensusHash - almost never changes, and lite clients really aren't concerned with the timeouts for the different voting periods. Hence it was relegated to BlockHash
  • LastResultsHash - Its unclear to me why this would be queried for in most blocks by a significant number of lite clients or why a data availability problem would exist here. Hence it should be moved to BlockHash
  • EvidenceHash - There isn't a data availability problem here. The reason being, the NextValidatorsHash would change, and a lite client has to query for the new validator updates then anyway. Anything we want to achieve with the EvidenceHash could be achieved by an int8 in the Header saying "NumEvidence", and if its non-zero querying the blockhash for the evidence. We shouldn't be taking a full hash of data in the header for this, when it will almost always be a nil hash.
  • ProposerAddress - This isn't removed in the above as in order for a lite client to exploit that this is a deterministic function of accums, height, round, they'd have to be able to derive accums. To derive accums, they'd have to watch every block / know when rounds are incremented, which is too high of a synchrony requirement.

Thoughts on this?

@ValarDragon ValarDragon changed the title Remove many fields from the block header Remove a few hashes from the block header in favor of merkelizing them Oct 2, 2018
@ebuchman
Copy link
Contributor

ebuchman commented Oct 3, 2018

Very cool proposal. Thanks for thinking about this.

Note that right now we have parity between the block header and the header sent in BeginBlock. So unless we introduce a new header that contains these hashes, apps wouldn't get this info. Though it's not entirely clear that apps need this info in the first place

data availability problem

So a hash that was previously in the header is now only a leaf in a new merkle tree with root in the header. Can you clarify how this affects data-availability? How is the new data (say, the ResultsHash and the internal nodes that proof it merklizes to BlockHash) different from other data we already don't have (eg. the txs themselves) ?

ConsensusHash - almost never changes

Will they not want to know when it does?

EvidenceHash - There isn't a data availability problem here. The reason being, the NextValidatorsHash would change,

Can't assume this - evidence might not change voting power right away, for instance.

Anything we want to achieve with the EvidenceHash could be achieved by an int8 in the Header saying "NumEvidence", and if its non-zero querying the blockhash for the evidence.

This is neat, so long as the EvidenceHash still goes into the BlockHash. The int makes sure that lite-clients know whenever there is evidence in a block.

@ValarDragon
Copy link
Contributor Author

EvidenceHash - There isn't a data availability problem here. The reason being, the NextValidatorsHash would change,

Can't assume this - evidence might not change voting power right away, for instance.

Oh cool, hadn't realized that. NumEvidence takes care of that then :)

ConsensusHash - almost never changes

Will they not want to know when it does?

I had misread which struct ConsensusHash was a hash of. It turns out its BlockSize and NumEvidence, so I do agree, they would want to know. So we can pull the same trick as with evidence, use an int8 / bool to specify if there was a change, while still keeping this merkelized.

data availability problem

So a hash that was previously in the header is now only a leaf in a new merkle tree with root in the header. Can you clarify how this affects data-availability? How is the new data (say, the ResultsHash and the internal nodes that prove it merklizes to BlockHash) different from other data we already don't have (eg. the txs themselves) ?

The ResultsHash and the internal nodes would be known to all full nodes and thus they could compare against the BlockHash. Your right, its not really different from txs in the sense that we're not expecting lite clients to know of the actual results, the same way we don't expect them to know the txs. But this is part of the point, they don't need to, its just that there is a need for the overall header / block hash to be a function which incorporates the results hash / data hash in a way that retains queryability.

Upon further thought, I think AppHash should be moved into the BlockHash as well. Not many lite clients will want to query AppHash across a range of blocks (The latest state when they want to query should suffice). However they will want to query a range of blocks for Tags, so it makes sense to make the TagHash remain in the top level. (Unless Results Hash is supposed to be the thing which we query for tags, if so I didn't realize that)

Note that right now we have parity between the block header and the header sent in BeginBlock. So unless we introduce a new header that contains these hashes, apps wouldn't get this info. Though it's not entirely clear that apps need this info in the first place

I don't think apps really need these hashes in the first place, so we can maintain this parity for now.

@ebuchman
Copy link
Contributor

ebuchman commented Oct 4, 2018

So we can pull the same trick as with evidence, use an int8 / bool to specify if there was a change, while still keeping this merkelized.

Into this.

Upon further thought, I think AppHash should be moved into the BlockHash as well.

Mmm let's not do this. AppHash is a first class citizen in Tendermint design, since it's returned by Commit. Let's keep it in the header.

However they will want to query a range of blocks for Tags, so it makes sense to make the TagHash remain in the top level. (Unless Results Hash is supposed to be the thing which we query for tags, if so I didn't realize that)

This is an open question - whether we split the results and tags hashes. It seems we should, so the results hash can just be for simple querying the results while the tags hash can support proofs of existence/absence on tags. Also, I think we're expecting tags to be the primary mode of querying, so they should have some first class status in the header ...

BTW - have you seen the new General Merkle Proof stuff? It would be able to capture these proofs, where we have merkle trees in merkle trees.

@ValarDragon
Copy link
Contributor Author

Mmm let's not do this. AppHash is a first class citizen in Tendermint design, since it's returned by Commit. Let's keep it in the header.

Sounds good. Might be worth revisiting querying needs once the network is live.

This is an open question - whether we split the results and tags hashes. It seems we should, so the results hash can just be for simple querying the results while the tags hash can support proofs of existence/absence on tags. Also, I think we're expecting tags to be the primary mode of querying, so they should have some first class status in the header ...

Agreed on both fronts. I think it should be split from the results hash, and that the TagHash should directly be in the header. (I see now that what I wrote previously was ambiguous, but I was in agreement :)).

BTW - have you seen the new General Merkle Proof stuff? It would be able to capture these proofs, where we have merkle trees in merkle trees.

I actually hadn't looked into it! Your right it looks like it will be able to capture these proofs. However, if I understand it correctly, I think it requires a fair amount of restructuring #postlaunch for greater efficiency (smaller proof size). I'll read into it more this week, and write-up a follow up issue.

I think the block hash should be structured as:

              Block Hash
          /               \
        2                   3 
     /    \               /    \
 Data         Res       Cons   Evidence

Where Res is last results hash, and Cons is Consensus hash.
This is because I think the set of use cases where you would want to query for previous block's results aligns the best with cases where you'd want to query for the current blocks data.

@ValarDragon
Copy link
Contributor Author

Upon further thought, I still don't think we need numEvidence in the top of the header. A lite client in our model really only needs to care about two things being immediately available to it from within the header: Ability to verify consensus happened (this happens via querying for signatures and having an in sync validator set copy), ability to query for data (AppHash and TagHash covers this). (There are papers / proposals in existence for extending the lite clients role to being able to verify state transitions and data availability of the actual block, but we don't use either of those in Tendermint atm)

So the lite client only needs to have an in-sync copy of the validator set. This is handled via nextValidatorSetHash and querying for the new validator powers if the hash changes. If you trust state transitions are executed properly, then evidence will be slashing validators at the correct height. It doesn't really matter that slashing could happen after arbitrary delay, since that doesn't change how you validate consensus. (I do agree that evidence hash needs to be in that merkelized tree of the BlockHash, just not at the top level for lite clients)

Also on further thought, I'm not sure that a lite client needs to know about the consensus hash having changed. ConsensusHash is BlockSize and NumEvidence, and accepted validator pubkey types. They don't need to know about blocksize and num evidence. Block size is unimportant, since they don't receive blocks. num evidence isn't important since you can tell which nodes in the evidence merkle tree are leafs and inner nodes thanks to #2713, and you trust the state machine to execute state transitions correctly if consensus happens. validator pubkey types (which just got added) are necessary for the lite client to know. However it would presumably require a hard upgrade to update the lite client, so the operator would have been notified well in advance in order to upgrade. However if the chain does upgrade, and the operator was unaware, they would have to halt on the detection of a new pubkey types addition. However the unaware operator would also be notified when the first validator uses the new pubkey type, and they could halt/ upgrade then. (First confirming that that pubkey type was actually in the new consensus hash ofc) Since a validator could use the new pubkey type 1 block after it makes it into the consensus hash, this is in principle a delay of only 1 block. (Unless the app adds an extra delay after the consensus params have changed, but then this only hurts the lite clients that don't pay attention to large hard forks. In that setting, I'd rather see a flag / boolean to notify lite clients of an impending hard fork or that a hard fork just happened instead of doing this)

@ValarDragon
Copy link
Contributor Author

Hrmm, another thought is what if we also merkelize nextValidatorsHash into the BlockHash, and only put a boolean in the top for validatorSetChanged. You already need a non-negligible number of hashes for the next validators hash, so this would only be adding 1 extra hash in the case of a change, but saving 31 bytes otherwise. (Actually may be 32 due to length prefixing of []byte, would have to double check) This makes sense to do as long as we expect on average there to be more than 1 block with no validator power update per 32 blocks, which I think is an accurate assumption. The new block hash structure could be as follows:

                   Block Hash
             /                     \
 nextValidatorsHash                 2
                                 /       \
                           Evidence       3 
                                        /     \
                                      Res       4
                                             /     \
                                          Data      Cons

Data and cons being at the bottom since if your querying data you can afford a few more hashes, and cons at the bottom since I don't see why you'd query it as a lite client. Evidence and res can be switched depending on what we think will be queried more.

@tessr
Copy link
Contributor

tessr commented Apr 24, 2020

Removing this from the 0.34 release (slated for mid-to-late May) because it still needs further discussion and design. Please speak up if you disagree!

@tac0turtle tac0turtle transferred this issue from tendermint/tendermint Oct 26, 2020
@tac0turtle tac0turtle added C:core S:proposal Status: Proposal T:enhancement Type: Enhancement labels Oct 26, 2020
@cmwaters cmwaters transferred this issue from tendermint/spec Feb 18, 2022
@github-actions github-actions bot added the stale for use by stalebot label Oct 3, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:core S:proposal Status: Proposal stale for use by stalebot T:enhancement Type: Enhancement
Projects
None yet
Development

No branches or pull requests

4 participants