Epic: VMs without keccak should be supported by Hyperlane #2399

nambrot · 2023-06-16T19:14:01Z

Context

The EVM is the most popular and widespread VM for blockchains to offer smart contracting abilities as Ethereum led the ways. Many alt-L1s and rollups specifically emphasize their EVM compatibility to attract developers as solidity/EVM has the best resources for developers to build their applications. Additionally, existing applications and code for Ethereum can be reused mostly out of the box on other EVM chains. Hyperlane's architecture was conceived to be able to support many diverse environments and the Sealevel and Fuel implementations in progress show that the surface area is rather doable.

However, one of the big assumptions that the Hyperlane protocol makes is the usage of keccak as a hashing function to construct

message IDs (by hashing the message metadata + body) and
build merkle trees on-chain.
digest that validators sign

The main use of keccak is to support a merkle tree which commits to all messages in the tree for both censorship resistance and fraud proof purposes. On the destination chain, the merkle tree can be used with signatures from validators over the root of the tree alongside a merkle proof to prove inclusion of the message. For processing, the message ID is used to do replay protection. Invalid signatures can be given as input to prove their lack of validity against.

This ticket tries to capture what supporting VMs without keccak requires or could look like (referred to as alt-VMs in this ticket)

Option 1: Status-quo, i.e. no/minimal changes to the protocol itself

Contracts

Alt-VM Mailboxes could just use the hashing function that is available in the VM for the message ID, while keeping the message body encoding the same. This would mean that a message can have different IDs depending on which VM it is processed in.
ISMs verifying a message do not actually need to know about the construction of message IDs.
MessageIdISM would need to be able to verify with the hashing function of the destination chain (which may be different from the origin chain)
- Probably easiest for now to create a new ISM type for this, maybe in the future it can be another view call/constant on the multisig ism interface
Mailbox replay protection would be done with the hashing function of the destination chain. (which may be different from the origin chain)

Agents

Messages and other related data are currently stored by their "keccak-ID" in the hyperlane DB
Validators can sign messages in multiple ways, i.e the current way (merkle keccak-tree root, message keccak-id, index), but also something like (message-blake-id, index). That way, consumers of these signatures can pick which ever one they need to process on a particular ISM/VM
Validator currently uses the keccak-based merkle tree to assess whether it has a correct read on the state and this check would either not be possible for other VMs or would have to be made modular to support other hashing functions
CheckpointWithMessageId probably has to be changed to CheckpointWithMessage to include the full message
Relayers pull the relevant signature for processing on the MessageIdISM

Pros:

Minimal changes, upstream can be used without needing to have different forks of the agents
Validators can be transparently upgraded to allow messages to be processed in other execution environments with the same instance

Cons:

Does not support processing with merkle proofs for censorship resistance
- Fundamentally not possible unless an accumulator without a hashing function is used
Validators cannot be slashed cross-VM

Open Questions:

Does a different domain hash have to be used to support other hashing functions in the pre-image of the signature to avoid validators putting themselves up for slashing?

Option/Step 2

Have mailboxes support multiple merkle trees (in environments that support multiple hashing functions)?

Examples

Cardano
- blake2b
Ink/Polkadot
- blake2b

Related Issues

#2388
#2258

The text was updated successfully, but these errors were encountered:

nambrot · 2023-06-16T19:20:21Z

@serejke @HariSeldon23 would love your read on this and see if that makes sense for y'all

HariSeldon23 · 2023-06-19T00:44:32Z

Thanks for the detailed response. The way I normally look at these types of issues are to determine what are my non negotiables and then work from there.

For me, it would be:

Consistent matching Message ID's across all connected chains. Without this assumption I believe data fragmentation will occur, data inconsistencies and also a higher cognitive load for new developers to name just a few issues without having this.
Agents will have to be able to confirm Merkle trees across different hashing algos.

With that being said from my interpretation of the above solution there are the following issues:

Message ID's aren't consistent across VM's
Not quite sure how you'd achieve adequate replay protection without consistent Message ID's.
Agents, from my limited understanding of this proposal, lose the ability to confirm Merkle trees across keccak vs non keccak supported chains. That's the biggest non negotiable imo.

So then it comes down to the age old question of a short term fix for today vs planning years in advance. The Bridge Wars are a competitive space and while market share is important today, it's going to be a battle that occurs over, at a minimum, the next decade. Keccak will eventually be replaced. Quantum computing will arrive. Lock/Burn and Mint bridge use cases will eventually be viewed as quaint. So a short term solution is just kicking the can down the road. Which tbf is what most competitors will do.

My preferred approach would be:

Consistent Message ID. This could be something like origin domain identifier - block.timestamp - msg.sender which would look something like 43114-1687135018-0xC8a47dfC6288C902197b11c9bDE0093C07687238
Then in RocksDB we could add extra fields for the MessageIDKeccakHash, MessageIDBlake2Hash, MessageIDSHA256Hash et al
There would need to be a middleware layer that converts the Merkle Tree between different hashing algo's. This middleware layer would need a caching functionality to future proof it against a growing amount of messages and ensure latency doesn't get it out of hand. This middleware would also be used to confirm Merkle trees across different hashing algo's match. You have to think at a certain level of maturity there will be Merkle tree's that will need to match between blake2 and sha256.

There are quite a few short term negatives to this approach:

All contracts would need to be upgraded to taking into account the new canonical messaging ID
The Middleware layer would need to be built
No incentive model for validators to run the middleware layer. Although tbf, there is no incentive model for validators as of today either

nambrot · 2023-06-19T13:50:06Z

Thanks for your input @HariSeldon23 ! Your non-negotiable unfortunately runs into another non-negotiable, in that the message ID needs to commit to the contents of the message. If you just do it around the metadata, then you lose a whole host of benefits from that commitment unless you write the whole message to storage. If you didn't need the commitment, you could just use the nonce of the message for identification purposes, i.e. something like ${mailboxAddress}-${originDomain}-${nonce} For replay protection, you'd probably still want commitments of the message though, but maybe we just don't call that ID.

It's not clear to me that processing merkle trees is a non-negotiable though, especially since it feels impossible to support. Censorship resistance is a very nice property, but I don't think its a must have (and can always be added later)

HariSeldon23 · 2023-06-20T01:11:44Z

Ok, that's fair enough. I realize I made quite a few bad assumptions in my original reply, now that I've done a deeper dive, I believe I've got a better understanding.

Let's talk about a connection between Ethereum (Domain = 1) and BlakeChain (Domain = 987). We will use the multisigISM.

We want to send a message to BlakeChain which should be used to execute a contract Counter (Address = 12345) to increment a variable counter. On BlakeChain we have a modified Mailbox contract which approximates the Mailbox contract on Ethereum.

So let's break down into how that will be achieved:

Ethereum.Mailbox.Dispatch(987, 12345, calldata)
Duplicate check occurs on MessageID
ISM Verification occurs
The messageID is inserted into the Merkle tree as a new leaf node.
Ethereum.Mailbox.Root is updated with Merkle tree
Message is delivered to BlakeChain's Mailbox contract with the calldata
BlakeChain's Mailbox contract passes on the calldata to the Counter smart contract with the calldata, which then successfully increments the variable.

So this covers sending a message to Ethereum from BlakeChain.

Now let's say after the successful sending of the message referenced above, we now want to send a message from BlakeChain to Ethereum. We just want to send a simple "Hello World" string.

BlakeChain.Mailbox.Dispatch(1, ,'Hello World")
Duplicate check occurs on MessageID - This should be fine as there will be a unique MessageID generated on BlakeChain
ISM Verification occurs - Validators would need to be adjusted to sign BlakeChain messages
The messageID is inserted into the Merkle tree as a new leaf node. - This is fine as validators are configured to only support a hub chain. Will this always be the case?
BlakeChain.Mailbox.Root is updated with the Merkle tree
Message is delivered to Ethereum's Mailbox contract with the string "Hello World"

Have I understood this correctly? If so, then I agree with your assessment and this provides a good starting point to start.

serejke · 2023-06-20T12:33:15Z

Thanks @nambrot for putting the ideas together. I confirm the proposed solution to support non-EVM chains should work.

`keccak` usages

First, I want to suggest a different angle of thinking about keccak and Hyperlane in general. We need to clearly separate different usages of keccak:

`MerkleTree`

Mailbox (outbox) is building MerkleTree with leaf = message ID and keccak256(left, right) used to hash non-leaf nodes.

Note: hashing function of the message ID may be different from the hashing function of the MerkleTree. The MerkleTree treats message IDs as just bytes32

Message ID

Calculated as

ID = keccak256([
    VERSION,
    count(),
    localDomain,
    sender,
    destinationDomain,
    recipientAddress,
    messageBody
])

ECDSA-signed digest of the validators' checkpoints

Validators use ECDSA signature over keccak256-hashed digest, calculated as:

hash = keccak256(keccak256(domainId, mailbox_address, "HYPERLANE") + root + nonce + message_id)

messageHash = keccak256("\x19Ethereum Signed Message:\n" + hash.length + message)
sig = ecdsa_sha256_sign(messageHash)
rsig = recoverable(sig) { with v = 0 | 1 }
ethsig = rsig_to_ethsig(rsig) { with v = 27 | 28 }
signature = ethsigEip155 { with v = 35 + chainId * 2 + 0 | 1 }

Note: validators can sign any checkpoints using any algorithm (ECDSA or EdDSA), depending on the destination chain's needs. The keccak256 is only a requirement of ECDSA (needs 32-byte input), but not of EdDSA (accepts arbitrary input).

`Mailbox`: `Inbox` and `Outbox`

Secondly, I prefer to think that Mailbox consists of 2 independent parts: Inbox and Outbox, which do not share a common state.

`Inbox`

Inbox is a stateless handler and router of incoming messages. It does not necessarily need to be a standalone contract.

In Solidity, the only Inbox's own state is delivered used to guarantee at-most-once processing.

In non-EVM chains, there may be a different way to achieve that:

Solana: if an account with a specific address exists, that means the message has already been processed.
Cardano: if a UTXO with specific content exists, the message has already been processed.

`Outbox`

Outbox is the stateful contract building MerkleTrees.

Replay protection

Thirdly, I might be missing the meaning of "replay protection". I get it as a guarantee that the same message will not be processed on two different chains. I believe this is not a problem. Validator signatures depend on both origin and destination domain IDs, and the receiving Inbox checks self.domainId === destinationDomainId

Rehashing on off-chain agents

I think @HariSeldon23 had a great idea on the middle-wares. I think off-chain agents can actually re-hash both the message ID and MerkleTree for the destination chain's needs.

Considering `EVM -> altVM` and `altVM -> EVM`

Now let's consider these two use cases separately

`EVM -> altVM`

EVM
- calculate message ID uses keccak
- Mailbox.dispatch ingests the message ID to keccak-MerkleTree
Validators
- may calculate message ID using blake2b and off-chain MerkleTree using blake2b
- sign checkpoints with ECDSA or EdDSA, depending on what's supported by altVM
Relayers
- wait for quorums on the checkpoints and process the message
altVM
- calculate message ID using blake2b
- receive blake2b-MerkleTree from the checkpoint, and reconstruct the blake2b-MerkleTree
- execute MultisigIsm N/M signature (ECDSA or EdDSA) verification: both MerkleTree and Message ID types are supported

`altVM -> EVM`

altVM
- calculate message ID using blake2b
- Outbox.dispatch ingests the message ID to blake2b-MerkleTree
Validators
- may calculate message ID using keccak and off-chain keccak-MerkleTree
- sign checkpoints: get EVM-specific digest using keccak then ECDSA
Relayers and EVM
- no differences. no need for hash-specific or origin-specific MultisigISMs

Challenges

AWS KMS does not support non-ECDSA signatures. For non-ECDSA chains, validators/relayers agents will need to be configured with several wallets.

serejke · 2023-06-20T12:50:54Z

@HariSeldon23 I think the ordering of actions in the Ethereum -> BlakeChain and BlakeChain -> Ethereum are inaccurate.

For example, Ethereum -> BlakeChain goes as (extended):

Ethereum
- Ethereum.Mailbox.Dispatch(987, 12345, calldata)
- The messageID is inserted into the Merkle tree as a new leaf node. Ethereum.Mailbox.Root is updated with Merkle tree
Validators
- Listen to Mailbox.Dispatch messages
- wait for chain finalization
- build an off-chain MerkleTree
- sign checkpoints and save them to S3
Relayers
- Listen to Mailbox.Dispatch messages
- Call BlakeChain.ISM(...) to get N validators that should achieve consensus
- Wait for M/N validators to achieve consensus by periodically downloading their S3 checkpoint jsons
- Try to deliver the message by calling Mailbox.process
BlakeChain
- Duplicate check occurs on MessageID
- MultisigISM is called with the provided signatures

The below diagram should help:

nambrot · 2023-06-21T00:19:03Z

Thirdly, I might be missing the meaning of "replay protection". I get it as a guarantee that the same message will not be processed on two different chains. I believe this is not a problem. Validator signatures depend on both origin and destination domain IDs, and the receiving Inbox checks self.domainId === destinationDomainId

Replay protection we refer to as the guarantee that a message that was already processed, can't be replayed again, i.e. processed again (i.e. what you call at-most-once processing)

I think off-chain agents can actually re-hash both the message ID and MerkleTree for the destination chain's needs.

I'm not sure I would call this "re-hashing", but hashing a message for its ID differently depending on the destination chain is what I suggested I believe. My main point is that you could start without needing to actually build the alt-VM merkle tree if you just start with only verifying signatures over the message ID (instead of in addition to the merkle tree proof). Building the alt-VM merkle tree is a nice-to-have only IMO (since you can't slash for misbehavior on the origin chain cross-VM)

HariSeldon23 · 2023-06-21T02:59:34Z

@HariSeldon23 I think the ordering of actions in the Ethereum -> BlakeChain and BlakeChain -> Ethereum are inaccurate.

For example, Ethereum -> BlakeChain goes as (extended):

Ethereum

Ethereum.Mailbox.Dispatch(987, 12345, calldata)

The messageID is inserted into the Merkle tree as a new leaf node. Ethereum.Mailbox.Root is updated with Merkle tree

Validators

Listen to Mailbox.Dispatch messages

wait for chain finalization

build an off-chain MerkleTree

sign checkpoints and save them to S3

Relayers

Listen to Mailbox.Dispatch messages

Call BlakeChain.ISM(...) to get N validators that should achieve consensus

Wait for M/N validators to achieve consensus by periodically downloading their S3 checkpoint jsons

Try to deliver the message by calling Mailbox.process

BlakeChain

Duplicate check occurs on MessageID

MultisigISM is called with the provided signatures

The below diagram should help:

This is fantastic. Thank you. This whole thread has been incredibly insightful. Going to work through a few small PoC's and will then track back here if anything comes out of it that hasn't been addressed here

github-project-automation bot added this to Hyperlane Tasks Jun 16, 2023

nambrot added the alt-VM non-EVM alternative execution environment cosmos solana move cardano polkadot label Jun 16, 2023

nambrot mentioned this issue Jun 16, 2023

Allow Rust implementations of MerkleTree / Proof / SparseMerkleTree to use non-keccak256 hash functions #2388

Closed

serejke mentioned this issue Jun 19, 2023

Cardano x Hyperlane integration tvl-labs/hyperlane-cardano#7

Closed

13 tasks

serejke mentioned this issue Jul 4, 2023

CIP-0101 | Integration of keccak256 into Plutus cardano-foundation/CIPs#524

Merged

avious00 changed the title ~~VMs without keccak should be supported by Hyperlane~~ Epic: VMs without keccak should be supported by Hyperlane Aug 10, 2023

avious00 added polkadot labels Aug 10, 2023

avious00 moved this to Backlog in Hyperlane Tasks Aug 10, 2023

HariSeldon23 mentioned this issue Aug 30, 2023

Ortege application w3f/Grants-Program#1913

Closed

10 tasks

avious00 added epic and removed cardano labels Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: VMs without keccak should be supported by Hyperlane #2399

Epic: VMs without keccak should be supported by Hyperlane #2399

nambrot commented Jun 16, 2023 •

edited

Loading

nambrot commented Jun 16, 2023

HariSeldon23 commented Jun 19, 2023

nambrot commented Jun 19, 2023

HariSeldon23 commented Jun 20, 2023

serejke commented Jun 20, 2023 •

edited

Loading

serejke commented Jun 20, 2023

nambrot commented Jun 21, 2023

HariSeldon23 commented Jun 21, 2023

Epic: VMs without keccak should be supported by Hyperlane #2399

Epic: VMs without keccak should be supported by Hyperlane #2399

Comments

nambrot commented Jun 16, 2023 • edited Loading

Context

Option 1: Status-quo, i.e. no/minimal changes to the protocol itself

Contracts

Agents

Option/Step 2

Examples

Related Issues

nambrot commented Jun 16, 2023

HariSeldon23 commented Jun 19, 2023

nambrot commented Jun 19, 2023

HariSeldon23 commented Jun 20, 2023

serejke commented Jun 20, 2023 • edited Loading

keccak usages

MerkleTree

Message ID

ECDSA-signed digest of the validators' checkpoints

Mailbox: Inbox and Outbox

Inbox

Outbox

Replay protection

Rehashing on off-chain agents

Considering EVM -> altVM and altVM -> EVM

EVM -> altVM

altVM -> EVM

Challenges

serejke commented Jun 20, 2023

nambrot commented Jun 21, 2023

HariSeldon23 commented Jun 21, 2023

nambrot commented Jun 16, 2023 •

edited

Loading

serejke commented Jun 20, 2023 •

edited

Loading

`keccak` usages

`MerkleTree`

`Mailbox`: `Inbox` and `Outbox`

`Inbox`

`Outbox`

Considering `EVM -> altVM` and `altVM -> EVM`

`EVM -> altVM`

`altVM -> EVM`