Shuffling tests #1

mratsim · 2018-07-23T14:36:47Z

The cutoffs function description is the followinf:

An algorithm that we use to split up validators into groups at the start of every epoch, determining at what height they can make attestations and what shard they are making crosslinks for.

The schema:

A full Python implementation:

SHARD_COUNT = 1024
ETH_SUPPLY_CAP = 2**27
DEPOSIT_SIZE = 32
MAX_VALIDATOR_COUNT = ETH_SUPPLY_CAP // DEPOSIT_SIZE
EPOCH_LENGTH = 64
SLOT_DURATION = 8
MIN_COMMITTEE_SIZE = 128
END_EPOCH_GRACE_PERIOD = 8

def get_cutoffs(validator_count):
    height_cutoffs = [0]
    # EPOCH_LENGTH // phi
    cofactor = 39
    STANDARD_COMMITTEE_SIZE = MAX_VALIDATOR_COUNT // SHARD_COUNT
    # If there are not enough validators to fill a minimally
    # sized committee at every height, skip some heights
    if validator_count < EPOCH_LENGTH * MIN_COMMITTEE_SIZE:
        height_count = validator_count // MIN_COMMITTEE_SIZE or 1
        heights = [(i * cofactor) % EPOCH_LENGTH
                   for i in range(height_count)]
    # If there are enough validators, fill all the heights
    else:
        height_count = EPOCH_LENGTH
        heights = list(range(EPOCH_LENGTH))

    filled = 0
    for i in range(EPOCH_LENGTH - 1):
        if not i in heights:
            height_cutoffs.append(height_cutoffs[-1])
        else:
            filled += 1
            height_cutoffs.append(filled * validator_count // height_count)
    height_cutoffs.append(validator_count)

    # For the validators assigned to each height, split them up
    # into committees for different shards. Do not assign the
    # last END_EPOCH_GRACE_PERIOD heights in an epoch to any shards.
    shard_cutoffs = [0]
    for i in range(EPOCH_LENGTH - END_EPOCH_GRACE_PERIOD):
        size = height_cutoffs[i+1] - height_cutoffs[i]
        shards = (size + STANDARD_COMMITTEE_SIZE - 1) // STANDARD_COMMITTEE_SIZE
        pre = shard_cutoffs[-1]
        for j in range(1, shards+1):
            shard_cutoffs.append(pre + size * j // shards)

    return height_cutoffs, shard_cutoffs

(h, s) = get_cutoffs(16)

# Note: Python 2 because Apple/Mac ...
print height # [0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
print len(height) # 65
print shard # [0, 16]

The Python implementation does not split like in the example
The example may be a simplification that shows the spirit of the spec but no sharding occurs because 16 validators is too small.

Nim implementation:
https://github.com/status-im/nim-beacon-chain/blob/585072ae154e3d10276b51397e707628a401c096/beacon_chain/private/helpers.nim#L35-L79

# to run from ./build
import ../beacon_chain/private/helpers
import ../beacon_chain/datatypes

let a = 16
var seed: Blake2_256_Digest

echo getShuffling(seed, a) # @[14, 9, 13, 10, 3, 6, 12, 1, 2, 0, 5, 15, 8, 7, 11, 4]
let (height, shard) = getCutoffs(a)

echo height # @[0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
echo height.len # 65
echo shard # @[0, 16]

mratsim · 2018-11-26T14:45:09Z

Renaming to shuffling tests as the specs changed significantly and cutoffs are now just shuffling.

Specs

`shuffle`

def shuffle(values: List[Any], seed: Hash32) -> List[Any]:
    """
    Returns the shuffled ``values`` with ``seed`` as entropy.
    """
    values_count = len(values)

    # Entropy is consumed from the seed in 3-byte (24 bit) chunks.
    rand_bytes = 3
    # The highest possible result of the RNG.
    rand_max = 2 ** (rand_bytes * 8) - 1

    # The range of the RNG places an upper-bound on the size of the list that
    # may be shuffled. It is a logic error to supply an oversized list.
    assert values_count < rand_max

    output = [x for x in values]
    source = seed
    index = 0
    while index < values_count - 1:
        # Re-hash the `source` to obtain a new pattern of bytes.
        source = hash(source)
        # Iterate through the `source` bytes in 3-byte chunks.
        for position in range(0, 32 - (32 % rand_bytes), rand_bytes):
            # Determine the number of indices remaining in `values` and exit
            # once the last index is reached.
            remaining = values_count - index
            if remaining == 1:
                break

            # Read 3-bytes of `source` as a 24-bit big-endian integer.
            sample_from_source = int.from_bytes(source[position:position + rand_bytes], 'big')

            # Sample values greater than or equal to `sample_max` will cause
            # modulo bias when mapped into the `remaining` range.
            sample_max = rand_max - rand_max % remaining

            # Perform a swap if the consumed entropy will not cause modulo bias.
            if sample_from_source < sample_max:
                # Select a replacement index for the current index.
                replacement_position = (sample_from_source % remaining) + index
                # Swap the current index with the replacement index.
                output[index], output[replacement_position] = output[replacement_position], output[index]
                index += 1
            else:
                # The sample causes modulo bias. A new sample should be read.
                pass

    return output

`split`

def split(values: List[Any], split_count: int) -> List[Any]:
    """
    Splits ``values`` into ``split_count`` pieces.
    """
    list_length = len(values)
    return [
        values[(list_length * i // split_count): (list_length * (i + 1) // split_count)]
        for i in range(split_count)
    ]

`clamp`

def clamp(minval: int, maxval: int, x: int) -> int:
    """
    Clamps ``x`` between ``minval`` and ``maxval``.
    """
    if x <= minval:
        return minval
    elif x >= maxval:
        return maxval
    else:
        return x

`get_new_shuffling`

def get_new_shuffling(seed: Hash32,
                      validators: List[ValidatorRecord],
                      crosslinking_start_shard: int) -> List[List[ShardAndCommittee]]:
    """
    Shuffles ``validators`` into shard committees using ``seed`` as entropy.
    """
    active_validator_indices = get_active_validator_indices(validators)

    committees_per_slot = clamp(
        1,
        SHARD_COUNT // EPOCH_LENGTH,
        len(active_validator_indices) // EPOCH_LENGTH // TARGET_COMMITTEE_SIZE,
    )

    # Shuffle with seed
    shuffled_active_validator_indices = shuffle(active_validator_indices, seed)

    # Split the shuffled list into epoch_length pieces
    validators_per_slot = split(shuffled_active_validator_indices, EPOCH_LENGTH)

    output = []
    for slot, slot_indices in enumerate(validators_per_slot):
        # Split the shuffled list into committees_per_slot pieces
        shard_indices = split(slot_indices, committees_per_slot)

        shard_id_start = crosslinking_start_shard + slot * committees_per_slot

        shards_and_committees_for_slot = [
            ShardAndCommittee(
                shard=(shard_id_start + shard_position) % SHARD_COUNT,
                committee=indices,
                total_validator_count=len(active_validator_indices),
            )
            for shard_position, indices in enumerate(shard_indices)
        ]
        output.append(shards_and_committees_for_slot)

    return output

Testing

Is a bit tricky due to the random seed, the test vectors should be produced by a literal copy of the python specs with the random seed.

Our implementation

https://github.com/status-im/nim-beacon-chain/blob/0d775eefcb17e528429d123e999c47d567eaafaf/beacon_chain/spec/helpers.nim#L12-L66

https://github.com/status-im/nim-beacon-chain/blob/0d775eefcb17e528429d123e999c47d567eaafaf/beacon_chain/spec/validator.nim#L80-L111

Ethereum/beacon_chain repo

It is still using the old dynasty, and seems like development is now down in Py-EVM.

def shuffle(lst: List[Any],
            seed: Hash32,
            config: Dict[str, Any]=DEFAULT_CONFIG) -> List[Any]:
    lst_count = len(lst)
    assert lst_count <= 16777216
    o = [x for x in lst]
    source = seed
    i = 0
    while i < lst_count:
        source = blake(source)
        for pos in range(0, 30, 3):
            m = int.from_bytes(source[pos:pos+3], 'big')
            remaining = lst_count - i
            if remaining == 0:
                break
            rand_max = 16777216 - 16777216 % remaining
            if m < rand_max:
                replacement_pos = (m % remaining) + i
                o[i], o[replacement_pos] = o[replacement_pos], o[i]
                i += 1
    return o


def split(lst: List[Any], N: int) -> List[Any]:
    list_length = len(lst)
    return [
        lst[(list_length * i // N): (list_length * (i+1) // N)] for i in range(N)
    ]


def get_new_shuffling(seed: Hash32,
                      validators: List['ValidatorRecord'],
                      dynasty: int,
                      crosslinking_start_shard: ShardId,
                      config: Dict[str, Any]=DEFAULT_CONFIG) -> List[List[ShardAndCommittee]]:
    cycle_length = config['cycle_length']
    min_committee_size = config['min_committee_size']
    shard_count = config['shard_count']
    avs = get_active_validator_indices(dynasty, validators)
    if len(avs) >= cycle_length * min_committee_size:
        committees_per_slot = len(avs) // cycle_length // (min_committee_size * 2) + 1
        slots_per_committee = 1
    else:
        committees_per_slot = 1
        slots_per_committee = 1
        while (len(avs) * slots_per_committee < cycle_length * min_committee_size and
               slots_per_committee < cycle_length):
            slots_per_committee *= 2
    o = []

    shuffled_active_validator_indices = shuffle(avs, seed, config)
    validators_per_slot = split(shuffled_active_validator_indices, cycle_length)
    for slot, slot_indices in enumerate(validators_per_slot):
        shard_indices = split(slot_indices, committees_per_slot)
        shard_id_start = crosslinking_start_shard + (
            slot * committees_per_slot // slots_per_committee
        )
        o.append([ShardAndCommittee(
            shard_id=(shard_id_start + j) % shard_count,
            committee=indices
        ) for j, indices in enumerate(shard_indices)])
    return o

Py-EVM

also uses dynasty in get_new_shuffling

https://github.com/ethereum/py-evm/blob/f2d0d5d187400ba46a6b8f5b1f1c9997dc7fbb5a/eth/beacon/utils/random.py#L27-L86

def shuffle(values: Sequence[Any],
            seed: Hash32) -> Iterable[Any]:
    """
    Returns the shuffled ``values`` with seed as entropy.
    Mainly for shuffling active validators in-protocol.
    Spec: https://github.com/ethereum/eth2.0-specs/blob/0941d592de7546a428066c0473fd1000a7e3e3af/specs/beacon-chain.md#helper-functions  # noqa: E501
    """
    values_count = len(values)

    if values_count > SAMPLE_RANGE:
        raise ValueError(
            "values_count (%s) should less than or equal to SAMPLE_RANGE (%s)." %
            (values_count, SAMPLE_RANGE)
        )

    output = [x for x in values]
    source = seed
    index = 0
    while index < values_count:
        # Re-hash the source
        source = blake(source)
        for position in range(0, 30, 3):  # gets indices 3 bytes at a time
            # Select a 3-byte sampled int
            sample_from_source = int.from_bytes(source[position:position + 3], 'big')
            # `remaining` is the size of remaining indices of this round
            remaining = values_count - index
            if remaining == 0:
                break

            # Set a random maximum bound of sample_from_source
            rand_max = SAMPLE_RANGE - SAMPLE_RANGE % remaining

            # Select `replacement_position` with the given `sample_from_source` and `remaining`
            if sample_from_source < rand_max:
                # Use random number to get `replacement_position`, where it's not `index`
                replacement_position = (sample_from_source % remaining) + index
                # Swap the index-th and replacement_position-th elements
                (output[index], output[replacement_position]) = (
                    output[replacement_position],
                    output[index]
                )
                index += 1
            else:
                pass

    return output


@to_tuple
def split(seq: Sequence[TItem], pieces: int) -> Iterable[Any]:
    """
    Returns the split ``seq`` in ``pieces`` pieces in protocol.
    Spec: https://github.com/ethereum/eth2.0-specs/blob/0941d592de7546a428066c0473fd1000a7e3e3af/specs/beacon-chain.md#helper-functions  # noqa: E501
    """
    list_length = len(seq)
    return [
        seq[(list_length * i // pieces): (list_length * (i + 1) // pieces)]
        for i in range(pieces)
    ]

https://github.com/ethereum/py-evm/blob/f2d0d5d187400ba46a6b8f5b1f1c9997dc7fbb5a/eth/beacon/helpers.py#L272-L344

@to_tuple
def get_new_shuffling(*,
                      seed: Hash32,
                      validators: Sequence['ValidatorRecord'],
                      dynasty: int,
                      crosslinking_start_shard: int,
                      cycle_length: int,
                      min_committee_size: int,
                      shard_count: int) -> Iterable[Iterable[ShardAndCommittee]]:
    """
    Return shuffled ``shard_and_committee_for_slots`` (``[[ShardAndCommittee]]``) of
    the given active ``validators``.
    Two-dimensional:
    The first layer is ``slot`` number
        ``shard_and_committee_for_slots[slot] -> [ShardAndCommittee]``
    The second layer is ``shard_indices`` number
        ``shard_and_committee_for_slots[slot][shard_indices] -> ShardAndCommittee``
    Example:
        validators:
            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
        After shuffling:
            [6, 0, 2, 12, 14, 8, 10, 4, 9, 1, 5, 13, 15, 7, 3, 11]
        Split by slot:
            [
                [6, 0, 2, 12, 14], [8, 10, 4, 9, 1], [5, 13, 15, 7, 3, 11]
            ]
        Split by shard:
            [
                [6, 0], [2, 12, 14], [8, 10], [4, 9, 1], [5, 13, 15] ,[7, 3, 11]
            ]
        Fill to output:
            [
                # slot 0
                [
                    ShardAndCommittee(shard_id=0, committee=[6, 0]),
                    ShardAndCommittee(shard_id=1, committee=[2, 12, 14]),
                ],
                # slot 1
                [
                    ShardAndCommittee(shard_id=2, committee=[8, 10]),
                    ShardAndCommittee(shard_id=3, committee=[4, 9, 1]),
                ],
                # slot 2
                [
                    ShardAndCommittee(shard_id=4, committee=[5, 13, 15]),
                    ShardAndCommittee(shard_id=5, committee=[7, 3, 11]),
                ],
            ]
    NOTE: The spec might be updated to output an array rather than an array of arrays.
    """
    active_validators = get_active_validator_indices(dynasty, validators)
    active_validators_size = len(active_validators)
    committees_per_slot = clamp(
        1,
        shard_count // cycle_length,
        active_validators_size // cycle_length // (min_committee_size * 2) + 1,
    )
    shuffled_active_validator_indices = shuffle(active_validators, seed)

    # Split the shuffled list into cycle_length pieces
    validators_per_slot = split(shuffled_active_validator_indices, cycle_length)
    for index, slot_indices in enumerate(validators_per_slot):
        # Split the shuffled list into committees_per_slot pieces
        shard_indices = split(slot_indices, committees_per_slot)
        shard_id_start = crosslinking_start_shard + index * committees_per_slot
        yield _get_shards_and_committees_for_shard_indices(
            shard_indices,
            shard_id_start,
            shard_count,
        )

Sigp

Sigp is still using the old min_committe_size instead of target committee size

https://github.com/sigp/lighthouse/blob/ba548e49a52687a655c61b443b6835d79c6d4236/beacon_chain/validator_shuffling/src/shuffle.rs

/// Given the validator list, delegates the validators into slots and comittees for a given cycle.
fn generate_cycle(
    validator_indices: &[usize],
    shard_indices: &[usize],
    crosslinking_shard_start: usize,
    cycle_length: usize,
    min_committee_size: usize,
) -> Result<DelegatedCycle, ValidatorAssignmentError> {
    let validator_count = validator_indices.len();
    let shard_count = shard_indices.len();

    if shard_count / cycle_length == 0 {
        return Err(ValidatorAssignmentError::TooFewShards);
    }

    let (committees_per_slot, slots_per_committee) = {
        if validator_count >= cycle_length * min_committee_size {
            let committees_per_slot = min(
                validator_count / cycle_length / (min_committee_size * 2) + 1,
                shard_count / cycle_length,
            );
            let slots_per_committee = 1;
            (committees_per_slot, slots_per_committee)
        } else {
            let committees_per_slot = 1;
            let mut slots_per_committee = 1;
            while (validator_count * slots_per_committee < cycle_length * min_committee_size)
                & (slots_per_committee < cycle_length)
            {
                slots_per_committee *= 2;
            }
            (committees_per_slot, slots_per_committee)
        }
    };

    let cycle = validator_indices
        .honey_badger_split(cycle_length)
        .enumerate()
        .map(|(i, slot_indices)| {
            let shard_start =
                crosslinking_shard_start + i * committees_per_slot / slots_per_committee;
            slot_indices
                .honey_badger_split(committees_per_slot)
                .enumerate()
                .map(|(j, shard_indices)| ShardAndCommittee {
                    shard: ((shard_start + j) % shard_count) as u16,
                    committee: shard_indices.to_vec(),
                }).collect()
        }).collect();
    Ok(cycle)
}

tersec · 2019-02-22T19:04:00Z

The new shuffling approach should probably, though tests remain always useful, live in a new issue. Closing.

* Fix firstSuccess() template missing timeouts. * Fix validator race condition. Fix logs to be compatible with beacon_node logs. Add CatchableError handlers to avoid crashes. Move some logs from Notice to Debug level. Fix some [unused] warnings. * Fix block proposal issue for slots in the past and from the future. * Change sent to published. * Address review comments #1.

mratsim changed the title ~~Cutoff split function spec or example issue~~ Shuffling tests Nov 26, 2018

mratsim mentioned this issue Nov 26, 2018

spec updates #20

Merged

mratsim mentioned this issue Dec 3, 2018

Shuffling test vectors generator #26

Merged

tersec closed this as completed Feb 22, 2019

mratsim mentioned this issue Mar 26, 2019

[Simulation] Broken Pipe after disconnecting a peer - TcpError or ClientQuitting #208

Closed

stefantalpalaru added a commit that referenced this issue Oct 31, 2019

debug #1

82258eb

xachen mentioned this issue Nov 24, 2020

Nimbus only wants to talk with Geth for ETH1 #2072

Closed

hylin911 mentioned this issue Feb 18, 2021

Mainnet crashed with the message SIGSEGV: Illegal storage access. #2134

Closed

cheatfate added a commit that referenced this issue Jul 19, 2021

Address review comments #1.

86c2a9e

etan-status pushed a commit that referenced this issue May 12, 2022

Attempts to fix #1

570467a

etan-status pushed a commit that referenced this issue May 12, 2022

Investigation of Linux freezes #1.

bb176ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuffling tests #1

Shuffling tests #1

mratsim commented Jul 23, 2018

mratsim commented Nov 26, 2018

mratsim commented Nov 30, 2018

mratsim commented Dec 3, 2018 •

edited

Loading

tersec commented Feb 22, 2019

Shuffling tests #1

Shuffling tests #1

Comments

mratsim commented Jul 23, 2018

mratsim commented Nov 26, 2018

mratsim commented Nov 30, 2018

mratsim commented Dec 3, 2018 • edited Loading

Specs

shuffle

split

clamp

get_new_shuffling

Testing

Our implementation

Ethereum/beacon_chain repo

Py-EVM

Sigp

tersec commented Feb 22, 2019

mratsim commented Dec 3, 2018 •

edited

Loading

`shuffle`

`split`

`clamp`

`get_new_shuffling`