Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy two hosts for benchmarking nimbus-eth1 import #194

Open
jakubgs opened this issue Aug 20, 2024 · 7 comments
Open

Deploy two hosts for benchmarking nimbus-eth1 import #194

jakubgs opened this issue Aug 20, 2024 · 7 comments
Assignees

Comments

@jakubgs
Copy link
Member

jakubgs commented Aug 20, 2024

The development of nimbus-eth1 is ramping up, and for that reason we will need to perform benchmarking of the process of importing the network state data with validation from ERA files. Currently this process is not optimized, as it's in it's early stages of development, which means a full import of mainnet would probably take more than a week. Despite that we need to start measuring results in order to figure out progress in import process optimization.

This benchmarking will require two kind of tests on two hosts:

  • A long running test that lasts a week.
  • A "short" running test that lasts 24 hours.

Both of those will not finish, so they will have to be aborted, but the amount of blocks they are able to sync will be the measure of performance. These performance reports will have to be archived in some way, simplest way would be to commit them to a dedicated repository. In addition to the reports gained this way the import process will make available a /metrics endpoint which we can scrape with Prometheus.

The two hosts can be purchased from Hetzner as the hosts will not be using external connections. The storage required will need to be at least 2x the size of Mainnet ERA and ERA1 files, which is currently ~1 2B, so a 2 TB additional NVMe would suffice. Aside from that more than 16 GB of RAM and 4 cores is enough.

update as of 28 Oct

Short test must begin with a template DB which contains blocks from 20M since measuring the import process from these blocks is what matters to the nimbus team. Jacek to provide this template db.

The long test will begin with no template DB and is also an import only test, it usually takes around a week.

The goal is to measure time taken to complete import in both cases.

@siddarthkay
Copy link
Contributor

How urgently do we need the 2 hosts?
I could either get a cheaper host from auction via Hetzner

Screenshot 2024-08-22 at 5 27 30 PM

OR

I could get a dedicated host which would be comparatively more expensive
example : https://www.hetzner.com/dedicated-rootserver/ax52/

Screenshot 2024-08-22 at 5 29 36 PM

My assumption is that the host from Auction might take longer to get compared to the dedicated one.

@siddarthkay
Copy link
Contributor

After discussing with @jakubgs, I finally went ahead with the following

2 x Dedicated Server AX42
* Location: Finland, HEL1
* For Finland, HEL1, support is only available in English.
* Rescue system (English)
* 1 x Primary IPv4
* 1 x 2 TB NVMe SSD
* 8 Core CPU
* 64 GB DDR5 ECC RAM

Order Details :

Screenshot_2024-08-22_at_5 55 10_PM

Possible wait times :

Screenshot_2024-08-22_at_5 54 55_PM

@siddarthkay
Copy link
Contributor

These 2 AX42 hosts have been activated by Hetzner and currently boot into rescue system.
I'll bootstrap these hosts and add them to our inventory.

siddarthkay added a commit that referenced this issue Sep 4, 2024
This commit adds 2 hetzner AX42 hosts for eth1 benchmarking to our network.

related issue: #194
siddarthkay added a commit that referenced this issue Sep 5, 2024
This commit adds 2 hetzner AX42 hosts for eth1 benchmarking to our network.

related issue: #194
siddarthkay added a commit that referenced this issue Sep 5, 2024
This commit adds 2 hetzner AX42 hosts for eth1 benchmarking to our network.

related issue: #194
@siddarthkay
Copy link
Contributor

siddarthkay commented Oct 23, 2024

I will use bech-01 for the short 24 hour test and bech-02 for the long running 1 week test.

Next steps are as follows :

  • Get ERA1 files from nel-01.ih-eu-mda1.nimbus.mainnet and move them to bech hosts.
 [email protected]:~ % sudo du -hsc /docker/era1
 428G    /docker/era1
 428G    total
  • Get ERA files from nel-01.ih-eu-mda1.nimbus.mainnet and move them to bech hosts.
  • Geth nimbus-eth1 running on bench-01.he-eu-hel1.nimbus.eth1 and bench-02.he-eu-hel1.nimbus.eth1 and make sure the node is running as expected and nimbus_eth1_network is set to mainnet.
  • On bench-01 set up the template DB which contains 20m blocks.
  • On bench-01 set up a systemd timer to trigger an import of the state by passing --era1-dir and --era-dir and log the time taken for the short sync to complete. This process should be terminated if it crosses more than 24 hours and when termination happens we have to log the progress percentage of the import.
  • The terminating script on bench-01 should also replace the existing db with the "template db" so that when the short test is run again its run from the state of 20m blocks.
  • On bench-02 Set up a systemd timer to trigger an "import" of the state by passing --era1-dir and --era-dir and log the time taken for the long sync to complete. This process should be terminated if it crosses more than 1 week and when termination happens we have to log the progress percentage of the import.
  • The terminating script on bench-02 should also clean up the existing db so that when import is run again, it is run from a clean slate.
  • Discuss 1st few results with Jacek and eth1 team in Discord.
  • Implement a process to auto publish this results by commiting to github.

@jakubgs
Copy link
Member Author

jakubgs commented Oct 23, 2024

Sounds correct. Remember that the timer will have to do several things:

  1. Measure and save progress and time it took.
  2. Stop the nimbus-eth1 service.
  3. Purge already synced data.
  4. Restart the nimbus-eth1 service.

@siddarthkay
Copy link
Contributor

as per @arnetheduck :

syncing = creating a state from blocks, usually sourced from the network
import = a method of syncing that reads era files instead of sourcing the blocks from the network - it's the same blocks, just in a file instead of requesting from nodes on the network

what we want to measure is the performance of turning blocks into a state - using import for this purpose eliminates the networking aspect of it focusing on the block processing component

@siddarthkay
Copy link
Contributor

The short benchmark was run and it completed in ~ 12 hours

Nov  4 23:32:38 bench-01.he-eu-hel1.nimbus.eth1 nimbus-eth1-mainnet-short-benchmark[147056]: 
INF 2024-11-04 23:32:38.837+00:00 Imported blocks                            
blockNumber=21005282 blocks=1005281 importedSlot=10223616 txs=160021843 mgas=15219979.109 
bps=24.889 tps=4328.405 mgps=376.486 avgBps=23.451 avgTps=3732.972 avgMGps=355.050 elapsed=11h54m27s

This was however run on a RAID 0 setup between 3 drives.

next steps :

  • rebootstrap the host with 2 512 GB Drives in RAID 1 setup and another 2 TB without a RAID setup
  • re run the short benchmark.
    .
    .
    .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants