Proposed beacon state test format #21

djrtwo · 2019-02-27T00:24:55Z

Test suite name

beacon_state

Test case format

config: <key/value pairs of phase 0 constants>
verify_signatures: <bool>
initial_state: <key/value pairs of fields of BeaconState>
blocks: <list of blocks>
expected_state: <key/value pairs of subset of BeaconState fields to state>
expected_state_root: <tree hash>

Format field notes

config [key/value]
- all fields of phase 0 constants.
- any missing config constant defaults to value found in phase 0 spec
verify_signatures (optional) [bool]
- default to False
- Any node that can produce a block should already have a flag enabling state transitions without verifying signatures. In most state tests containing valid transitions, this should be disabled.
- Generally only enabled in state tests that fail due to invalid signatures.
initial_state [key/value]
- all fields of BeaconState
blocks [list]
- A list of blocks to be processed sequentially on top of the initial state
expected_state [key/value]
- a subset of fields of BeaconState containing the expected values of the resulting state
expected_state_root (optional) [32-byte hex string]
- hash_tree_root(state) after processing to the latest block in blocks

Example

title: Sample Slot Update State Test
summary: Tests slot updates
test_suite: beacon_state
fork: tchaikovsky
version: 1.0

test_cases:
- config:
    SHARD_COUND: 16
    TARGET_COMMITTEE_SIZE: 16
    GENESIS_SLOT: 50
    GENESIS_EPOCH: 10
    MIN_ATTESTATION_INCLUSION_DELAY: 1
    SLOTS_PER_EPOCH: 5
    LATEST_RANDAO_MIXES_LENGTH: 20
    LATEST_BLOCK_ROOTS_LENGTH: 20
    LATEST_ACTIVE_INDEX_ROOTS_LENGTH: 20
    LATEST_SLASHED_EXIT_LENGTH: 20
  initial_state:
    slot: 50
    genesis_time: 0
    fork:
      previous_version: 0
      current_version: 0
      epoch: 10
    validator_registry:
      - pubkey: '0x424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242'
        withdrawal_credentials: '0x0000000000000000000000000000000000000000000000000000000000000000'
        activation_epoch: 10
        exit_epoch: 100000000
        withdrawable_epoch: 10000000000
        initiated_exit: False
        slashed: False
      - pubkey: '0x606060606060606060606060606060606060606060606060606060606060606060606060606060606060606060606060'
        withdrawal_credentials: '0x0000000000000000000000000000000000000000000000000000000000000001'
        activation_epoch: 10
        exit_epoch: 100000000
        withdrawable_epoch: 10000000000
        initiated_exit: False
        slashed: False
    validator_balances: [32000000000, 32000000000]
    validator_registry_update_epoch: 10
    latest_randao_mixes: ['0x0000000000000000000000000000000000000000000000000000000000000000', '0x0000000000000000000000000000000000000000000000000000000000000000', ..., '0x0000000000000000000000000000000000000000000000000000000000000000']
    previous_shuffling_start_shard: 0
    current_shuffling_start_shard: 0
    previous_shuffling_epoch: 10
    current_shuffling_epoch: 10
    previous_shuffling_seed: '0x0000000000000000000000000000000000000000000000000000000000000000'
    current_shuffling_seed: '0x0000000000000000000000000000000000000000000000000000000000000000'
    previous_justified_epoch: 10
    justified_epoch: 10
    justification_bitfield: 0
    finalized_epoch: 10
    latest_crosslinks:
      - epoch: 10
        crosslink_data_root: '0x0000000000000000000000000000000000000000000000000000000000000000'
      - epoch: 10
        crosslink_data_root: '0x0000000000000000000000000000000000000000000000000000000000000000'
      ...
      - epoch: 10
        crosslink_data_root: '0x0000000000000000000000000000000000000000000000000000000000000000'
    batched_block_roots: []
    latest_eth1_data:
      deposit_root: '0x0000000000000000000000000000000000000000000000000000000000000000'
      block_hash: '0x0000000000000000000000000000000000000000000000000000000000000000'
    deposit_index: 0
  verify_signatures: False
  blocks:
    - slot: 51
      parent_root: '0x0000000000000000000000000000000000000000000000000000000000000000'
      randao_reveal: '0x0000000000000000000000000000000000000000000000000000000000000000'
      eth1_data:
        deposit_root: '0x0000000000000000000000000000000000000000000000000000000000000000'
        block_hash: '0x0000000000000000000000000000000000000000000000000000000000000000'
      body:
        proposer_slashings: []
        attester_slashings: []
        attestations: []
        deposits: []
        voluntary_exits: []
        transfers: []
      signature: '0x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
  expected_state:
    slot: 51

Notes

By having expected_state_root as optional and expected_state allowed to specify just a subset of fields, we use this format for very narrowly targeted sets of tests (like the above update slot test, or maybe a deposit processing test) as well as for full blown state tests that test very complex transitions.
The above format assumes that the block parent_root will not be assessed for validity. Could just leave it out and handle locally in each client as makes sense.
The above format assumes that proposer, attestation, and randao reveal signatures are not to be verified when verify_signatures == False.
We might specify "defaults" for fields across most objects to reduce the verbosity of tests. For example, Validator epoch fields should all just default to FAR_FUTURE_EPOCH unless explicitly specified otherwise. Block body arrays might just default to empty unless specified. etc.
Should probably be able to specify an invalid state transition. Maybe an is_invalid: True field.

The text was updated successfully, but these errors were encountered:

jannikluhn · 2019-02-27T09:21:42Z

What's a bit unclear to me is if this type of test is for block processing or for state transitions in general. I think it would be useful to have tests specific for slot and epoch processing that don't contain any blocks, and to have tests for block processing without slot or epoch processing. In addition, full tests make sense as well, to check if slots/blocks/epochs are processed in the right order, but we probably need much fewer of those as most complexity lies in block processing. But for full tests we should specify first and last processed slot and epoch as well. This could be implicitly defined (e.g. we just say the test runner has to process all empty slots between the initial state slot and the block slots, and do potential epoch transitions up until the one directly following the last block).

One concern I have is that some of the tests could get unnecessarily verbose. Defaults for block and state fields as you mentioned would help a lot already. In addition to that, Justin proposed reduced constant sets for testing which could shorten the config section. Maybe we can replace config with something like config_base and config_changes: config_base would be an identifier of the constant set (e.g. main, simple, testnet_abc, ...) and config_changes would be the (optional and usually unnecessary) dictionary of changes to the base.

Only in state tests that fail due to invalid signatures should this be enabled.

Just a minor side note, but I think we should have at least a few full tests with valid signatures that are checked. Otherwise we might get false positives (or false negatives? I can never tell), i.e. tests pass even though the clients signature verification is broken.

mratsim · 2019-02-27T10:16:17Z

Only in state tests that fail due to invalid signatures should this be enabled.

Just a minor side note, but I think we should have at least a few full tests with valid signatures that are checked. Otherwise we might get false positives (or false negatives? I can never tell), i.e. tests pass even though the clients signature verification is broken.

I think we should distinguish unit tests for logical components and integration tests for the whole machinery as debugging the whole machinery is quite complex and it helps a lot to target small pieces at a time.

Also one thing I'm afraid of is the timing with state tests:

64 slots per epoch is quite long for testing, in nimbus we use 8 slots per epochs during development
debug mode: right now 6 seconds slot is barely enough to process the state in debug mode in Nim (which is a fast language) so we compile with optimizations enabled (but still stacktraces, range checks ...)

djrtwo · 2019-03-13T21:32:26Z

is if this type of test is for block processing or for state transitions in general

A state transition is (pre_state, block) -> post_state where multiple things happen under the hood. In that respect. I'd like these state tests to be block by block based. I see some value in pulling out components of the state transition (epoch-transition, slot-transition, etc) but the more we go granular here, the more testing machinery we are requiring of each client and the more specific interface within the state transition we are requiring as well.

I have a series of state tests using the above format ready to release with the next version of the spec (v0.5.0). The tests are quite verbose but adding defaults for constants or constants sets aren't going to save us much space. It's the pre_state (specifically the cache arrays) that really get us.

I think we should distinguish unit tests for logical components and integration tests for the whole machinery

The tests I currently have do so in specifying just a subset of the expected_state to test against (for example testing that processing an block at the next slot increments the slot rather than specifying the entirety of the state).

Do you have something more granular in mind @mratsim ?

Just a minor side note, but I think we should have at least a few full tests with valid signatures that are checked

agreed

I'll share the tests I've generated from the executable python spec soon

paulhauner · 2019-03-14T14:41:17Z

It's pretty minor, but I'd like it if all the "config" values were in lower case.

All of these variables are lower-case in my structs and as far as my YAML parser is concerned, SHARD_COUNT != shard_count.

I suspect this will be the case for other clients too, considering that it'll need to be a variable if it's set from YAML and variables are generally lower-case.

No big deal though, I can devise a work-around.

Thanks for you efforts on it :)

djrtwo · 2019-03-14T21:39:49Z

ah, interesting. standard in python is that constants are capitalized so my config expects caps :)

The vectors I'm releasing right now are capitalized. I'm down to switch to whichever is preferred by most languages.

paulhauner · 2019-03-15T00:50:24Z

I'm just calling lower_string on the entire YAML for now!

Case-insensitive blockchains are the future, IMO.

protolambda · 2019-03-15T10:36:52Z

For the Go executable spec it is a little bit more complicated to make this config design work:

The goals of the Go version are:

compile time guarantees, no need to deal with unnecessary edge cases during run-time.
define arrays by their size, not as slices that are actively maintained
sizes are part of types
clean optimization: if the compiler can work with it, it's better/faster

Now I've implemented a spec-test runner, but the current implementation, using ldflags to inject compile time variable changes from the config, doesn't work. It doesn't override constants, only variables, which don't offer these features. What I can do however is to implement it as a build constraint: simply refer to a different definition of the constants. But this is not something you would want to do for every single test case.

And I imagine that many others would prefer to just make it part of the build, instead of dealing with all the extra complexity (ensure lengths everywhere, initialize things, unclear types)

Given that the "config" section is only really used to make tests more efficient, can't we just agree on a few build presets?

E.g.:
phase0.yaml:

SHARD_COUNT: 1024
TARGET_COMMITTEE_SIZE: 128
...

minimal.yaml:

SHARD_COUNT: 8
SLOTS_PER_EPOCH: 5
LATEST_RANDAO_MIXES_LENGTH: 20
LATEST_BLOCK_ROOTS_LENGTH: 20
LATEST_ACTIVE_INDEX_ROOTS_LENGTH: 20
LATEST_SLASHED_EXIT_LENGTH: 20
...

giant.yaml:

SHARD_COUNT: 32768
SLOTS_PER_EPOCH: 128
...

And a few more.

And then simply reference these in all the test suite cases, instead of duplicating the config contents:

- config: minimal
  initial_state:
...
  expected_state:
    slot: 51

And then we can have the best of both worlds (compile time guarantees and testing) :)

Edit: alternatively, we could also list these configs by their name in a special configs.yaml

paulhauner · 2019-03-16T07:50:22Z

And I imagine that many others would prefer to just make it part of the build, instead of dealing with all the extra complexity (ensure lengths everywhere, initialize things, unclear types)

This has been raised several times in Lighthouse: do we make the constants constant or keep them as variables? Currently we keep all the spec "constants" as variables in a ChainSpec struct.

We prefer variables for two main reasons:

Flexibility during the development process.
Future support for testnets without recompiling.

We've come to the decision that constants don't give us significantly more safety (we can quite reasonably make assumptions thatChainSpec is sane and consistent throughout the program). We acknowledge that passing the ChainSpec around is annoying, however not annoying enough to sacrifice flexibility or UX.

Personally, I'd vote to keep all constants variable (lol) because defining scenarios is (a) more work and (b) less flexible. @protolambda I understand that you're in a different situation though, you need to prioritize readability above (almost) all else -- I wouldn't complain if defining scenarios was implemented, just sharing my POV :)

protolambda · 2019-03-16T15:33:02Z

The two options are not exclusive.

The config scenarios idea boils down to the following:

are defined separately from the test cases (either in separate files, or a separate configs.yaml)
- no duplicate configs
- easier to maintain, smaller changes
are referenced by the test cases
- state test cases are leaner
can be converted to build-time constraints
- each config can be converted to a config_<name>.<cpp/go/etc>. and used as substitute during compile time. Makes the minimal spec implementations easier; readability, compile-time guarantees, and no runtime config management.
can optionally be interpreted during run-time/test-time to change settings, if you choose for "variable constants" in your implementation.
- tests can still work with them; you even have to do less parsing for many small tests
- possibly useful during run-time changes (i.e. forks): you can use one config for slot < X and change to the next for slot >= X

protolambda · 2019-03-16T16:06:06Z

Also, I would like the state test format to support the definition of a test-case for a sub-state-transition. Each of these transitions already supports a generalized in/output, the same as the big block/epoch transition itself:

(state)->state for epoch transitions, and parts of it:

Eth1
Justification
Crosslinks
Rewards and Penalties
Ejections
Validator Registry
Slashings
Exit Queue
Finish

(state,block)->state for block transitions, and parts of it:

Header
Randao
Eth1
Proposer Slashings
Attester Slashings
Attestations
Deposits
Voluntary Exits
Transfers

Or alternatively, we create a separate format or two for this, since state sub-transitions don't have block inputs, and block transitions only have 1 block.

djrtwo added the format-proposal label Feb 27, 2019

djrtwo changed the title ~~Proposed state test format~~ Proposed beacon state test format Feb 27, 2019

djrtwo mentioned this issue Mar 14, 2019

add basic sanity tests for v0.5.0 ethereum/eth2.0-tests#23

Merged

protolambda mentioned this issue Mar 16, 2019

Proposal for suite structure, test types, and configs #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed beacon state test format #21

Proposed beacon state test format #21

djrtwo commented Feb 27, 2019 •

edited

Loading

jannikluhn commented Feb 27, 2019

mratsim commented Feb 27, 2019

djrtwo commented Mar 13, 2019

paulhauner commented Mar 14, 2019 •

edited

Loading

djrtwo commented Mar 14, 2019

paulhauner commented Mar 15, 2019

protolambda commented Mar 15, 2019 •

edited

Loading

paulhauner commented Mar 16, 2019

protolambda commented Mar 16, 2019

protolambda commented Mar 16, 2019

Proposed beacon state test format #21

Proposed beacon state test format #21

Comments

djrtwo commented Feb 27, 2019 • edited Loading

Test suite name

Test case format

Format field notes

Example

Notes

jannikluhn commented Feb 27, 2019

mratsim commented Feb 27, 2019

djrtwo commented Mar 13, 2019

paulhauner commented Mar 14, 2019 • edited Loading

djrtwo commented Mar 14, 2019

paulhauner commented Mar 15, 2019

protolambda commented Mar 15, 2019 • edited Loading

paulhauner commented Mar 16, 2019

protolambda commented Mar 16, 2019

protolambda commented Mar 16, 2019

djrtwo commented Feb 27, 2019 •

edited

Loading

paulhauner commented Mar 14, 2019 •

edited

Loading

protolambda commented Mar 15, 2019 •

edited

Loading