Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests should run Besu with zero-second block time #105

Closed
peterbroadhurst opened this issue Aug 29, 2024 · 6 comments · Fixed by #146
Closed

Tests should run Besu with zero-second block time #105

peterbroadhurst opened this issue Aug 29, 2024 · 6 comments · Fixed by #146
Labels
good first issue Good for newcomers

Comments

@peterbroadhurst
Copy link
Contributor

peterbroadhurst commented Aug 29, 2024

The code here https://github.com/kaleido-io/paladin/tree/main/testinfra/besu_bootstrap starts up Hyperledger Besu as part of the startTestInfra with the production grade and strategic QBFT consensus algorithm.

However, the shortest block time available for that algorithm (currently) is 1 second, even in a development environment.

This makes tests really slow that do actual blockchain transactions - because they need to wait for up to 2 seconds for their transactions to make it into a block and be confirmed, and sometime there are multiple sequential transactions required.

So even though it's deprecated it would be great to have an alternative set of code in besu_bootstrap that tries configuring Besu using Clique with blockperiodseconds and hyperledger/besu#6082:
https://besu.hyperledger.org/23.4.0/private-networks/how-to/configure/consensus/clique

Note: I'm not 100% sure personally that Hyperledger Besu works like go-ethereum has always has done for Clique, where this zero-second block time means "mine on demand" - but the easiest way to find out is to try it!

@peterbroadhurst peterbroadhurst added the good first issue Good for newcomers label Aug 29, 2024
@peterbroadhurst peterbroadhurst changed the title Investigate running Besu in Clique consensus with zero-second block time Investigate tests running Besu in Clique with zero-second block time Aug 29, 2024
@dwertent dwertent self-assigned this Sep 3, 2024
@dwertent
Copy link
Contributor

dwertent commented Sep 5, 2024

To address this, I attempted to switch to the Clique consensus algorithm, which allows for a configurable block period. The idea was to try and set blockperiodseconds to 0, effectively achieving "mine on demand" behavior similar to go-ethereum.

Here’s how I set the config:

{
  "nonce": "0x0",
  "timestamp": "0x66d9b88c",
  "gasLimit": "0x1c9c380",
  "difficulty": "0x1",
  "mixHash": "0x0d6f339e8c801e48f07bbbecc66a52a199196c2c8e4cd3faa87a0ef61c09b1c1",
  "coinbase": "0x0000000000000000000000000000000000000000",
  "alloc": {
    "0x4e05906d15e380ee769803d3ab50af9f298a6ca5": {
      "balance": "0x33b2e3c9fd0803ce8000000"
    }
  },
  "config": {
    "chainId": 1337,
    "cancunTime": 0,
    "zeroBaseFee": true,
    "clique": {
      "blockperiodseconds": 0,
      "epochlength": 30000,
      "createemptyblocks": false
    }
  },
  "extraData": "0x70616c6164696e000000000000000000000000000000000000000000000000004e05906d15e380ee769803d3ab50af9f298a6ca50000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
}

Unfortunately, as expected, setting blockperiodseconds to 0 is not supported in Besu (error message below), which requires a minimum block period of 1 second:

Invalid property value, blockperiodseconds should be a positive integer: 0

Further Findings

Even when setting blockperiodseconds to 1, I didn’t notice a significant reduction in CPU usage of the Besu container, which indicates that switching to the Clique consensus algorithm alone might not be a sufficient solution.

Next Steps

I’ve opened a draft PR (#125) with the code changes to switch to Clique consensus for review. This is just for reference, and we will close the PR if we decide not to pursue this solution further.

cc @matthew1001

@matthew1001
Copy link
Contributor

matthew1001 commented Sep 5, 2024

I'm not aware of Besu having any option to run with blockperiodseconds<1, regardless of whether QBFT or clique is the consensus protocol. In fact I think it's one of the common set of "mining parameters" that are validated the same for all consensus types. A delve into the code shows that it's specifically expected to be >0:

image

It would be interesting (and maybe not a huge amount of work?) to add an option to Besu to allow sub-second blocks purely for development purposes. I think Ethereum blocks have a timestamp that's only at second granularity so maybe there would be some weird side effects?

Getting an OSS PR merged with anything that's too hacky isn't going to be easy but a patch on our fork with a hard-coded 100ms block time or something might be doable.

@dwertent
Copy link
Contributor

dwertent commented Sep 5, 2024

Yes, I believe this is beyond the scope of this issue.

cc @peterbroadhurst

@peterbroadhurst
Copy link
Contributor Author

😢 - I do think we want to be testing with Besu (rather than Geth/Quorum) in this repo... but practically it's going to be an absolutely nightmare if we can't write full blockchain tests.

@matthew1001 going to ask for your input here from the Besu maintainer perspective

@matthew1001
Copy link
Contributor

I've just opened draft PR hyperledger/besu#7588 which checks for env var BESU_X_DEV_BFT_PERIOD_MS and honours it for block periods.

E.g. for export BESU_X_DEV_BFT_PERIOD_MS=300 the logs look like:

2024-09-09 13:04:15.215+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #22 / 0 tx / 0 pending / 0 (0.0%) gas / (0x3a8554d0a1be313bbb376afa0e1d371c314136fd77c529d62806c0020d3273a2)
2024-09-09 13:04:15.529+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #23 / 0 tx / 0 pending / 0 (0.0%) gas / (0xd954fa411730e7665aaa00b61521a76dccd46fa292f67981a971c51ed936199f)
2024-09-09 13:04:15.840+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #24 / 0 tx / 0 pending / 0 (0.0%) gas / (0x4089f123be3fbeba47f5e9bb1d0ab7d9ec636c5d843a53939dd133a787a89d5f)
2024-09-09 13:04:16.147+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #25 / 0 tx / 0 pending / 0 (0.0%) gas / (0x2a2228f72cba1dc525d9b955be0726bdb8f2078597e4e5c113af090f29c8a0f9)
2024-09-09 13:04:16.485+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #26 / 1 tx / 0 pending / 1,905,458 (0.0%) gas / (0x0568e721640bfa1119ee9720dce6fc3f5bb7f8daa1cbaca5caf4d786b256f51c)
2024-09-09 13:04:16.795+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #27 / 0 tx / 0 pending / 0 (0.0%) gas / (0x3401a2adf1dfa0dacf37a8a501fcaa5956f1ce9e12a029a7fcebc18a7fbdbc7b)
2024-09-09 13:04:17.107+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #28 / 0 tx / 0 pending / 0 (0.0%) gas / (0x11e9fd506432634c84247ff6c12cce820931dd571f237eb130905cbb96e458cb)
2024-09-09 13:04:17.417+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #29 / 0 tx / 0 pending / 0 (0.0%) gas / (0xd1c7a486b6548ac0e03038a7dbfa8c1029aaf889acd15735c9c1b9fecfbef199)
2024-09-09 13:04:17.728+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #30 / 0 tx / 0 pending / 0 (0.0%) gas / (0x3937c9d8c3e4b82547a25ec558c7720bda09b399494e046a722326cf600628c2)
2024-09-09 13:04:18.040+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced #31 / 0 tx / 0 pending / 0 (0.0%) gas / (0x14f5541e56e72a644f06abbe810c79a545b786f62680efeab1dcc12d85ab3214)

I'll start a discussion on discord about how amenable the community are to this sort of dev/ci-cd oriented feature.

I did have to hack the block-header timestamp verifier because it assumes a block < 1 second older than its parent never happens.

@dwertent
Copy link
Contributor

dwertent commented Sep 9, 2024

TL;DR

Creating an empty block every 300ms is highly demanding in terms of both memory and CPU resources.

Overview

I built an image from the fork and tested it with both consensus algorithms.

Clique Configuration

Here’s how I set up the Clique configuration:

https://github.com/kaleido-io/paladin/blob/cd2028931fb20840329d14967c85efa3629daec9/testinfra/besu_bootstrap/genesis.go#L97-L101

Here’s the docker-compose file for Clique:

  besu:
    container_name: besu
    image: hyperledger/besu:latest
    user: 0:0
    volumes:
      - besu_data:/var/besu:rw
    ports:
      - 8545:8545
      - 8546:8546
    depends_on:
      besu_bootstrap:
        condition: service_completed_successfully
    healthcheck:
      test: ["CMD-SHELL", "timeout 10s bash -c ':> /dev/tcp/localhost/8545'"]
      interval: 5s
      timeout: 5s
      retries: 10
    environment:
      BESU_X_DEV_BFT_PERIOD_MS: 300
    command:
      - --logging=DEBUG
      - --rpc-http-enabled
      - --rpc-http-api=ETH,CLIQUE,WEB3,DEBUG
      - --rpc-ws-enabled
      - --rpc-ws-api=ETH,CLIQUE,WEB3,DEBUG
      - --tx-pool=SEQUENCED
      - --tx-pool-limit-by-account-percentage=1.0
      - --tx-pool-max-size=1000000
      - --target-gas-limit=30000000
      - --genesis-file=/var/besu/genesis.json
      - --data-path=/var/besu/data
      - --node-private-key-file=/var/besu/key
      - --revert-reason-enabled
      - --host-allowlist=localhost,besu
volumes:
  besu_data:
    driver: local

QBFT Configuration

Here is how I configured QBFT:
https://github.com/kaleido-io/paladin/blob/cd2028931fb20840329d14967c85efa3629daec9/testinfra/besu_bootstrap/genesis.go#L166-L170

Here’s the docker-compose file for QBFT:

services:
  postgres:
    image: postgres
    ports:
      - 5432:5432
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: my-secret
  besu_bootstrap:
    image: paladin/besu_bootstrap
    user: 0:0
    volumes:
      - besu_data:/var/besu:rw
    command:
      - --dir=/var/besu 
      - --algorithm=QBFT
  besu:
    container_name: besu
    image: hyperledger/besu:24.9-develop-dc6324c
    user: 0:0
    volumes:
      - besu_data:/var/besu:rw
    ports:
      - 8545:8545
      - 8546:8546
    depends_on:
      besu_bootstrap:
        condition: service_completed_successfully
    healthcheck:
      test: ["CMD-SHELL", "timeout 10s bash -c ':> /dev/tcp/localhost/8545'"]
      interval: 5s
      timeout: 5s
      retries: 10
    command:
      - --logging=DEBUG
      - --rpc-http-enabled
      - --rpc-http-api=ETH,QBFT,WEB3,DEBUG
      - --rpc-ws-enabled
      - --rpc-ws-api=ETH,QBFT,WEB3,DEBUG
      - --tx-pool=SEQUENCED
      - --tx-pool-limit-by-account-percentage=1.0
      - --tx-pool-max-size=1000000
      - --target-gas-limit=30000000
      - --genesis-file=/var/besu/genesis.json
      - --data-path=/var/besu/data
      - --node-private-key-file=/var/besu/key
      - --revert-reason-enabled
      - --host-allowlist=localhost,besu
volumes:
  besu_data:
    driver: local

Results:

The CPU usage clearly indicates that the consumption is far higher with the current setup, making it less efficient.
CPU

mem

@dwertent dwertent changed the title Investigate tests running Besu in Clique with zero-second block time Tests should run Besu with zero-second block time Sep 23, 2024
@dwertent dwertent linked a pull request Sep 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
3 participants