Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIP-0031: Atomic switch to non-programmable FVM #294

Merged
merged 8 commits into from
Feb 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
396 changes: 396 additions & 0 deletions FIPS/fip-0031.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,396 @@
---
fip: "0031"
title: Atomic switch to non-programmable FVM
authors: Raúl Kripalani (@raulk), Steven Allen (@stebalien)
discussions-to: https://github.com/filecoin-project/FIPs/discussions/296
status: Draft
type: Technical Core
category: Core
created: 2022-02-03
spec-sections:
- TBD
requires:
- FIP-0030 (Introducing the Filecoin Virtual Machine)
replaces: N/A
---

# Atomic switch to non-programmable FVM

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->


- [Simple Summary](#simple-summary)
- [Abstract](#abstract)
- [Change Motivation](#change-motivation)
- [Specification](#specification)
- [WASM bytecode management](#wasm-bytecode-management)
- [Warming up the module cache](#warming-up-the-module-cache)
- [Atomic switch of execution layer](#atomic-switch-of-execution-layer)
- [Migration procedure](#migration-procedure)
- [Optional removal of intrinsic VM from Filecoin client codebases](#optional-removal-of-intrinsic-vm-from-filecoin-client-codebases)
- [Actor changes](#actor-changes)
- [Init actor](#init-actor)
- [Power actor](#power-actor)
- [Design Rationale](#design-rationale)
- [Backwards Compatibility](#backwards-compatibility)
- [CodeCIDs](#codecids)
- [Non-versioned changes and state tree migrations](#non-versioned-changes-and-state-tree-migrations)
- [Test Plan](#test-plan)
- [Coordinated testnets](#coordinated-testnets)
- [Per-implementation testnets](#per-implementation-testnets)
- [Cross-implementation testnets](#cross-implementation-testnets)
- [Security Considerations](#security-considerations)
- [Incentive Considerations](#incentive-considerations)
- [Product Considerations](#product-considerations)
- [Implementation](#implementation)
- [Copyright](#copyright)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

## Simple Summary

This FIP proposes an **atomic switch** from the current intrinsic VM to a
**non-programmable version of the Filecoin Virtual Machine**, at a chain epoch
TBD.

A baseline specification of the Filecoin Virtual Machine is provided in
[filecoin-project/FIPs#288](https://github.com/filecoin-project/FIPs/pull/288).
Only the canonical built-in actors will be supported after the atomic switch.

This switch won't introduce new user-facing features or capabilities, but it's
an important step in the trajectory towards full on-chain user-programmability.
It also improves network security by sandboxing execution and making it
deterministic.

## Abstract

At chain height TBD, the network's execution layer will transition from the
intrinsic VM to a non-programmable version of the FVM. By _non-programmable_ we
mean that users won't be able to deploy custom code, yet.

The network will continue running the logic of the prevailing built-in actor
version, estimated to be v7 (assuming this FIP is scheduled to go live right
after nv15: OhSnap).

However, the code and deployment form of built-in actors will now be Wasm
bytecode. Actor code will begin to be identified through real
content-addressing. The current _synthetic_ actor CodeCIDs will migrate to CIDs
that hash over their actual Wasm bytecode.

## Change Motivation

Swapping out core components in the execution layer of a blockchain is a complex
operation. It's analogous to replacing the engine of an airplane while airborne.

For this reason, its advisable to manage risk carefully. Monolithic, big-bang
deployments are risky. It's appropriate to break up changes in smaller units
that can be deployed in an orderly, incremental fashion, eventually leading up
to the final goal: on-chain user-programmability.

Even though this FIP doesn't introduce user-facing features, it represents a
fundamental technological transition in the network, where a new technology
comes alive and the previous technology is decomissioned, all in a single atomic
step.

Subsequent FVM-related FIPs will introduce changes atop the foundations
installed in the network by this FIP.

## Specification

### WASM bytecode management

The WASM bytecode for built-in actors will be available within CAR files
generated by the build scripts at repo:
https://github.com/filecoin-project/canonical-actors (TBD).

These CAR files will contain one IPLD block per built-in actor. The IPLD block
will contain the bytecode in raw and full (no chunking into IPLD DAGs).

The node must load the content of these CAR files into the node's state
blockstore. These blocks will remain orphan until the migration is run at the
upgrade epoch. Thus, any form of stage garbage collection must explicitly
protect these blocks from deletion (e.g. splitstore). At migration, all actors
will be linked to their respective bytecode in the state tree.

The node must verify that exactly N blocks were loaded, with CIDs ...

### Warming up the module cache

Prior to the upgrade epoch, it is advisable for the Filecoin client to warm up
the Wasm module cache by pre-compiling the built-in actors into machine code.

### Atomic switch of execution layer

Filecoin clients must simultaneously support both execution layers:

1. The current, intrinsic VM (the source).
2. The non-programmable FVM (the target).

At upgrade epoch TBD, blocks assembled by block producers in epoch TBD-1 with
the source VM will be compiled into a tipset and be executed by all validators
**with the target VM**.

From that moment onwards, block producers will construct blocks using the target
VM. Validators will validate tipsets with target VM also. In other words, both
block production and validation will operate with the target VM.

Following one finality (900 epochs), the deprecation of the source VM will be
considered final.

> Notes:
> Should we run sanity check procedures in the epochs preceding TBD? Validating
> tipsets in parallel with both VMs; and aborting the upgrade on mismatch.

### Migration procedure

At epoch TBD, prior to executing the tipset assembled at epoch TBD-1, Filecoin
clients will run a migration over the state tree to replace the CodeCID in the
ActorState struct of every actor.

The replacement will be performed according to this replacement table:

| Old value (synthetic CID) | New value (content-addressed CID) |
| --------------------------------- | ----------------------------------------------------- |
| `fil/7/system` | TBD |
| `fil/7/init` | TBD |
| `fil/7/cron` | TBD |
| `fil/7/account` | TBD |
| `fil/7/storagepower` | TBD |
| `fil/7/storageminer` | TBD |
| `fil/7/storagemarket` | TBD |
| `fil/7/paymentchannel` | TBD |
| `fil/7/multisig` | TBD |
| `fil/7/reward` | TBD |
| `fil/7/verifiedregistry` | TBD |

> Notes:
> What would a premigration look like here?

### Optional removal of intrinsic VM from Filecoin client codebases

After the upgrade epoch is over and one finality is attained, Filecoin clients
may choose to complete erase the current VM from their respective codebases.

Because the canonical actors codebase supports only actors v6+, in doing so they
will lose the ability to sync past portions of the chain. This may be an
undesirable loss of functionality depending on the desired UX of every client,
hence why this is entirely optional.

Alternatively, Filecoin clients may provide historical chain support by
preserving the intrinsic VM in their codebases but activating it only through
optional compilation (e.g. using Go build flags, or Rust Cargo features).

### Actor changes

#### Init actor

The init actor acts like the constructor for actor types that can be
instantiated by user messages, namely:

1. Miner actor, via the power actor.
2. Multisig actor.
3. Payment channel actor.

While this FIP introduces no changes in this behaviour, nor to the allow list
itself, the filtering logic will need to use the new content-addressed CodeCIDs
for these actors.

#### Power actor

The `CreateMiner` method (method number 2) calls the Init actor to construct a
new Miner actor. It will need to pass the new content-addressed CodeCID for the
latter.

## Design Rationale

WIP.

## Backwards Compatibility

### CodeCIDs

Changing the CodeCID of built-in actors can have visible consequences for users:

1. When constructing multisig or payment channel actors, they will need to use a
content-addressed CodeCID (unpredicable) instead of a synthetic CID
(predictable).
2. When querying an actor in the state tree (e.g. via the `StateGetActor`
JSON-RPC API in Lotus), the `Code` field will no longer follow a structured
name. This may break applications that depend on CodeCID parsing (e.g.
statediff).

A possible solution is to implement a lookup table that maps synthetic CodeCIDs
to content-addressed CodeCIDs, and vice versa, for JSON-RPC handlers to use.

When CodeCIDs are to be returned, JSON-RPC operations can be extended with a
"legacy CodeCIDs" option for the user to demand a conversion prior to returning.

### Non-versioned changes and state tree migrations

Today, actor logic can change without its version being affected and, therefore,
without the associated CodeCIDs changing in the state tree. This is because the
actor version (e.g. actors v6) represents a version of the ABI, not of the
actor's logic.

However, with the adoption of a canonical Wasm actors codebase across the
network, _any_ and _every_ change in actor logic will result in different
bytecode. Because actor code is now truly content-addressed over the bytecode,
the CodeCIDs will change, therefore requiring a migration of all relevant actors
in the state tree where, in the past, such migration would've not been
necessary.

There is no action to take in this FIP, but it is worth noting the difference in
change management dynamics that this FIP will bring on.

## Test Plan

This section offers a comprehensive test plan for this FIP. It harnesses
multiple testing techniques to attain high confidence on this FIP across various
facets. Some test efforts actually target the prerequisite [FIP-0030 -
Introducing the Filecoin Virtual
Machine](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0030.md)
(which this FIP activates), and not direct aspects of this FIP.

- Test vectors for mainnet equivalence up until nv15, inclusive
- FVM-level conformance: FVM implementations should directly pass the test
vector corpus in
[filecoin-project/fvm-test-vectors](https://github.com/filecoin-project/fvm-test-vectors),
when vectors are fed at the Machine level.
- Client-level conformance: Clients integrating the FVM should pass the test
vector corpus in
[filecoin-project/fvm-test-vectors](https://github.com/filecoin-project/fvm-test-vectors),
when vectors are fed at the client level.

- Back-testing mainnet
- Client implementations adopting the FVM should be capable of syncing mainnet
the following mainnet chain range:
- Start: epoch `1231620` (Chocolate upgrade, with activation of nv14 and
actors v6)
- End: most recent epoch following the OhSnap upgrade (nv15 / actors v7).
- _Implementers’ note:_ This test can be performed by:
1. Obtaining the minimal snapshot for epoch `1233360`, which includes the
state trees and chain objects from 1802 epochs back (this number is the
equivalent of 2 finalities + 2 epochs for buffer), thus covering the
Chocolate upgrade at `1231620`. [Link to
snapshot.](https://fil-chain-snapshots-fallback.s3.amazonaws.com/mainnet/minimal_finality_stateroots_1233360_2021-10-27_04-00-00.car)
2. Loading it into the client.
3. Rewinding the client’s chain head to the cited start epoch.

- Boosting Rust actors unit/integration test coverage
- Compared to the outgoing canonical actors (Go specs-actors), Rust actors
have poorer test coverage.
- In order to not deteriorate the quality of software and enable the rapid
development of changes, we must strengthen the test coverage of Rust actors
to be on par with Go actors (at least), prior to appointing them as the
canonical actors.

- Live syncing mainnet as validators (mainnet shadow tests)
- Client implementations adopting the FVM should have no problem keeping up
with the live chain as it advances. Validation times should be within
pre-FVM orders of magnitude.
- _Implementers’ note:_ this test can only be performed with a version of the
FVM that does not implement this FIP, and therefore is equivalent to
mainnet.

- Servicing block producers
- Client implementations adopting the FVM should have no problems servicing
block producers. This entails serving the_correct_ mining base upon winning
a round.
- _Implementers’ note:_ Verification may be performed in mainnet shadow tests,
and/or in testnets (e.g. Calibrationnet).

- Mainnet upgrade drills
- Client implementations should perform the upgrade against live mainnet
state. The migration must not disrupt block production at the upgrade epoch.
- _Implementers’ note:_ The
[filecoin-project/ent](https://github.com/filecoin-project/ent) tool may
come in handy.

- Coordinated testnets
- Client implementations wishing to join the network upgrade deploying this
FIP should join the coordinated testnets plan outlined below.


## Coordinated testnets

Testnets are a critical instrument to test and verify the behavior of
implementations under different network conditions, some of which mimic mainnet.

Below is a tentative testnet deployment plan proposed by the maintainers of the
reference implementation (Lotus). We encourage all implementations who have
confirmed their ability to join the mainnet upgrade that activates this FIP to
also join the testnet efforts.

![FIP-0031 testnet plan](../resources/fip-0031/testnet-plan.jpg)

### Per-implementation private testnets

All implementations should carry out private testing prior to joining the shared
testnets below.

For illustrative purposes, here is the test plan of the reference implementation
(Lotus + ref-fvm). Implementations may adopt a similar blueprint. This plan
relies on the creation of a private network for rapid iteration named
"Caterpillarnet", born with Lotus + mainnet-compatible FVM.

**Test Phase 1 (runtime: 5 days) => Caterpillarnet with mainnet-compatible FVM:**
- Objective: rapid continuous sampling of FVM behavior in a brand new network.
- Network parameters:
- Block time: 5s.
- Sector sizes: 512MiB, 32GiB, 64GiB.
- Minimum number of reference implementation (Lotus) block producers/storage
providers: 6.
- Consensus: Expected Consensus with fake winning PoSt (in order to speed up
block production).
- Butterflynet keeps running, accumulating state in preparation for the Test
Phase 2 migration.

**Test Phase 2 (runtime: 2 days) => Caterpillarnet with FIP-0031 (this FIP):**
- Objective: validation of FIP-0031 upgrade procedure and basic migration.
- Caterpillarnet goes through the FIP-0031 upgrade (switching to canonical
actors and adopting content-addressed CodeCIDs).

### Cross-implementation shared testnets

From here onwards, the test plan specifies collaborative testing phases.

**Test Phase 3 (runtime: 1 week) => Butterflynet with FIP-0031 (this FIP):**
- Objective: validation of FIP-0031 upgrade procedure, collecting observations
such as time taken, physical state growth, IO workload, and more.
- Butterflynet goes through the FIP-0031 upgrade (switching to canonical actors
and adopting content-addressed CodeCIDs).
- Community members are encouraged to join Butterflynet prior to the network
upgrade happening.

**Test Phase 4 (runtime: 1 week) => Butterflynet with FIP-0031 and FIP-nnnn (gas
remodel):**
- Objective: validate gas model changes.
- Caterpillarnet and Butterflynet are reset and warped into the cited state.

**Test Phase 5 (runtime: 2 weeks) => Release candidate stabilization.**
- Butterflynet drills as we stabilize release candidates and release the final
RC (1.16.0-rcN) for deployment on Calibrationnet.

**Test Phase 6 (runtime: 3 weeks) => Calibrationnet upgraded to final RC.**
- Ongoing testing and monitoring.

## Security Considerations

WIP.

## Incentive Considerations

N/A.

## Product Considerations

WIP.

## Implementation

WIP.

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Binary file added resources/fip-0031/testnet-plan.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.