Skip to content
This repository has been archived by the owner on Aug 18, 2022. It is now read-only.

EVM <> FVM mapping #39

Merged
merged 6 commits into from
Jan 7, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 65 additions & 1 deletion 04-evm-mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,71 @@ However, keep in mind that execution is ultimately controlled by FVM gas and not

## Account model

## Address semantics
## Addressing scheme

Ethereum uses 160-bit (20-byte) addresses. Addresses are the keccak-256 hash of the public key of an account, truncated to preserve the 20 rightmost bytes. Solidity and the [Contract ABI spec](https://docs.soliditylang.org/en/v0.5.3/abi-spec.html) represent addresses with the `address` type, equivalent to `uint160`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, for contract addresses, it's the RLP-encoding of sender + nonce. I'll add that for reference.


There's an active, yet informal proposal to [increase the address width to 32 bytes](https://ethereum-magicians.org/t/increasing-address-size-from-20-to-32-bytes/5485).

In Filecoin, addresses are multi-class, and there are currently four recognized classes. Sidenote: they're actually called _protocols_ in the spec, but we'll refrain from using that term here because it's hopelessly overloaded.

The address byte representation is as follows:

```
class (1 byte) || payload (n bytes)
```

Thus, the total length of the address varies depending on the address class.

- Class `0` (ID addresses): payload is [multiformats-style uvarint](https://github.com/multiformats/unsigned-varint). Maximum 9 bytes.
- Class `1` (Secp256k1 key): payload is a blake2b-160 hash of the secp256k1 pubkey. Fixed 20 bytes.
- Class `2` (actor addresses): payload is a blake2b-160 hash of some payload generated by the init actor. Fixed 20 bytes.
- Class `3` (BLS key): payload is an inlined BLS public key. Fixed 48 bytes.

In conclusion, the maximum address length in Filecoin is 49 bytes or 392 bits (class 3 address). This creates two problems:

1. The worst case scenario is larger than the width of the Ethereum address type. Even if BLS addresses were prohibited in combination with EVM actors, class 1 and class 2 still miss the limit by 1 byte (due to the prefix).
2. It exceeds the EVM's 256 bit architecture.

Problem 1 renders Solidity smart contracts instantly incompatible with the Filecoin addressing scheme, as well as EVM opcodes that take or return addresses for arguments, e.g. CALLER, CALL, CALLCODE, DELEGATECALL, COINBASE, etc. This problem is hard to work around, and would require a fork of the EVM to modify existing opcodes for semantic awareness of addresses (although this is really hard to get right), or to introduce a Filecoin-specific opcode family to deal Filecoin addresses (e.g. FCALL, FCALLCODE, etc.) The latter would break as-is deployability of existing smart contracts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: CALLER and COINBASE (and likely others) won't have this issue. All runtime APIs (in the current VM) return ID addresses, but accept (and resolve) other address types.


Problem 2 can be workable by spilling over and combining values in the stack, through Filecoin-specific Solidity libraries.

**Solution A: using ID addresses**

However, there's a simpler solution: use Filecoin ID addresses (max. 10 bytes) everywhere inside EVM execution. However, this comes with drawbacks:

1. EVM smart contracts can't send to inexisting, stable account addresses, and rely on account actor auto-creation, as those addressess can't be used with EVM opcodes (see problem 1). Potential solution: have the caller create the account on chain prior to invoking the EVM smart contract.
2. ID addresses are vulnerable to reorg within the current finality window, so submitting EVM transactions involving actors created recently (900 epochs; 7.5 hours) would be unsafe. Potential solution: have the runtime detect and fail calls involving recently-created actors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it would be possible to assign every actor a stable address (including account actors) and use that everywhere. That would mean addresses would be 20 bytes and unambiguous.

However, we'd have to make a few changes:

  1. Account actors currently don't have stable addresses (just key-based addresses).
  2. There's currently no reverse map from ID addresses to stable addresses. We'd likely need to add a "stable address" field to every actor.

But this should be doable.

But there's a whole other can of worms...

  1. In the EVM, it's possible to send funds to any address.
  2. If that address turns out to be the hash of a public key, it's possible to then use that address to send messages (an account).
  3. There's a CREATE2 instruction that allows an actor to create another actor in some "owned" address space (effectively using an actor specific KDF with an actor controlled salt).
  4. If that address turns out to be part of the address space "owned" by another actor, code can later be deployed to this address (and it gets to keep the existing funds).

Basically:

  1. We do need to be able to send to a "public key" address from an EVM contract, because there likely exist contracts that do that.
  2. We also need to be able to seed arbitrary addresses with funds, before even knowing the type of actor that will live there.
  3. We likely need to support 3 because this feature was a bit of a big deal (it enables some things with payment channels, apparently).

All of this is leading me to believe that we're going to need a bit of an indirection layer. Possibly a registry mapping "EVM" addresses to the rest of the FVM address space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all a bit confusing so I'll try to explain more in the standup. Unfortunately, documentation is scattered and almost universally of the "here's how to take your first steps in Ethereum" form not the "this is how this thing actually works" form.

Copy link
Member Author

@raulk raulk Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Account actors currently don't have stable addresses (just key-based addresses).

Aren't pubkey addresses stable addresses? What's the nuance here?

Related: account actors are also bound to an ID at creation, so every actor is guaranteed to have an ID address, which is volatile during the current finality window.

Copy link
Member Author

@raulk raulk Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  • CREATE calculates the address by RLP encoding a struct containing the address of the sender (an Externally Owned Account, EAO) and their nonce. Such addresses can be trivially computed ahead of time.
  • CREATE2 hashes the RLP encoding of [0xff, sender_addr, user_provided_salt, bytecode]). These can be precomputed by knowing the inputs ahead of time, and it's the basis for "counterfactual deployments" -- use cases in which we interact with the contract ahead of time.
  • Sending ETH to an address doesn't turn it into an EOA; it can still be the target of code deployment. This property is also the basis for some counterfactual interactions.
  • It is not possible to conduct an appropriation attack by exploiting the knowledge of a future contract address ahead of time.
    • First, you'd need to defeat has collision resistance to find a private key whose Eth address matches the target contract address (computational expense is estimated to be 2**80 hashes, as per various sources, including this).
    • Second, as of EIP-684, the protocol aborts CREATE or CREATE2 instructions that generate an address with a non-zero nonce.
    • In conclusion, even if you found a colliding key, you can do one of two things: (a) not use it, in which case when the contract account is created, you'd be locked out of that address because non-EAO addresses can't perform transactions (I think); or (b) use it, in which case it would be marked as an EO, but its nonce would be non-zero, so CREATE/CREATE2 would abort.

Relevant references (in addition to the yellow paper).

Copy link
Member Author

@raulk raulk Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We do need to be able to send to a "public key" address from an EVM contract, because there likely exist contracts that do that.

If the pubkey address exists ahead of time, the contract can use a reorg-stable ID address (I'll post a proposal shortly).

If the address doesn't exist ahead of time, this becomes harder because the CALL opcode consumes a single word for the recipient address (and probably truncates it to 160 bits), yet our pubkey addresses can span up to 2 Ethereum words.

  1. We also need to be able to seed arbitrary addresses with funds, before even knowing the type of actor that will live there.

Yes, 100% agreed.

  1. We likely need to support 3 because this feature was a bit of a big deal (it enables some things with payment channels, apparently).

Yes, but this should be straightforward IMO; we'd generate an f2 address using the user-provided inputs to assemble the preimage passed to address.NewActorAddress(preimage). The output can be a reorg-stable ID address (which I'm defining in a subsequent PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't pubkey addresses stable addresses? What's the nuance here?

Sorry, f2 address. You're right, "stable" just means "not f0".

**Solution B: using address handles**

If these tradeoffs are unacceptable, we can consider using _address references/handles_ in the FVM EVM calling convention. Input parameters would be enveloped in a tuple:

```
(1) ABI-encoded parameters (using uint160 addr handles) || (2) { addr handle: actual addr }
```

Where:

1. ABI encoded parameters replacing address positions with indexed uint160.
2. Mapping of indices to real Filecoin addresses.

On an incoming call, the EVM <> FVM shim would unpack the call and pass only (1) as input parameters to the smart contract. It would use (2) to resolve the address whenever the smart contract called a relevant opcode. When returning, the EVM <> FVM shim would perform the inverse operation.

However, address-returning opcodes are still unsolved (e.g. CREATE, CREATE2, COINBASE, SENDER). The contract may want to persist these addresses, so making them return address handles is not an option, as they aren't safe to persist.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this applies more generally. If I pass a handle 0x0000..01 to a smart contract as an input parameter, and it decides to persist it immediately, I would have no opportunity to resolve that handle into the actual address. This solution doesn't work at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this solution seems brittle.


Finally, this approach alters the calling convention, which in turns breaks compatibility with existing Ethereum tooling like wallets (e.g. MetaMask).

**Solution C: using address guards**

Another alternative consists of adopting ID addresses (like proposed in Solution A), but when those addresses are "fresh" (i.e. created within the finality window), allowing to pack a stable address guard/assertion in a data structure similar to that of Solution B.

The EVM <> FVM shim would apply assertions prior to invoking the contract.

This solution imposes extra complexity on the caller (so as to determine address freshness). It may require extending the InitActor's state object to inline the creation epoch for ease of query.

This solution also suffers from the ecosystem tooling compatibility drawbacks, just like Solution B.

## Gas accounting and execution halt semantics

Expand Down