Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide configuration guidelines for BufferedStorageBackend settings #5390

Closed
urvisavla opened this issue Jul 15, 2024 · 5 comments
Closed
Assignees

Comments

@urvisavla
Copy link
Contributor

What problem does your feature solve?

In #4911, we added an option for Horizon to reingest historical ledgers from the datastore using BufferedStorageBackend.
BufferedStorageBackend has a BufferSize that specifies the number of ledgers to hold in memory and a NumWorkers to specify the number of parallel download workers. We need to run tests and benchmark to identify the optimal configuration values.

What would you like to see?

Configuration guidelines for BufferedStorageBackend.

What alternatives are there?

Let users find the best settings through experimentation.

@urvisavla
Copy link
Contributor Author

Reingestion times for 10,000 ledgers with various configurations of LedgersPerFile, Buffer Size, and Num Workers:

LedgersPerFile Buffer Size Num Workers Time Taken
1 1 1 34m58s
1 5 1 30m4s
1 10 1 27m50s
1 100 1 20m41s
1 500 1 19m42s
1 5 5 17m20s
1 10 5 17m20s
1 100 5 17m1s
1 500 5 16m55s
1 10 10 18m26s
1 100 10 22m2s
1 500 10 19m40s
1 100 20 18m10s
1 500 20 19m24s
100 1 1 18m49s
100 5 1 17m29s
100 10 1 17m10s
100 100 1 18m24s
100 500 1 17m25s
100 5 5 16m56s
100 10 5 16m54s
100 100 5 17m53s
100 500 5 18m3s
100 10 10 19m43s
100 100 10 19m9s
100 500 10 20m0s
100 100 20 18m51s
100 500 20 19m46s
1000 1 1 23m13s
1000 5 1 22m51s
1000 10 1 22m24s
1000 100 1 22m52s
1000 500 1 22m54s
1000 5 5 22m47s
1000 10 5 22m54s
1000 100 5 22m57s
1000 500 5 24m6s
1000 10 10 26m0s
1000 100 10 23m32s
1000 500 10 23m47s
1000 20 10 23m39s
1000 500 20 24m15s

Summary

  • LedgersPerFile: 1:

    • Best performance: Buffer Size: 500, Num Workers: 5
    • Time taken: 16m55s
  • LedgersPerFile: 100:

    • Best performance: Buffer Size: 10, Num Workers: 5
    • Time taken: 16m54s
  • LedgersPerFile: 1000:

    • Best performance: Buffer Size: 10, Num Workers: 1
    • Time taken: 22m24s

Recommendations

  1. For small number of LedgersPerFile (1 LedgersPerFile):

    • Use Buffer Size: 500 and Num Workers: 5 for the fastest processing time.
  2. For medium number of LedgersPerFile (100 LedgersPerFile):

    • Use Buffer Size: 10 and Num Workers: 5 for the fastest processing time.
  3. For large number of LedgersPerFile (1000 LedgersPerFile):

    • Use Buffer Size: 10 and Num Workers: 1 for the fastest processing time.
    • Note: Logically, for larger LedgersPerFile values, 1 worker and a buffer size of 2 should be sufficient. However, I tested with a buffer size of 1 and then directly with 10. I will run a test with 1 worker and a buffer size of 2 and update the results.

@urvisavla
Copy link
Contributor Author

Note: Logically, for larger LedgersPerFile values, 1 worker and a buffer size of 2 should be sufficient. However, I tested with a buffer size of 1 and then directly with 10. I will run a test with 1 worker and a buffer size of 2 and update the results.

Additional tests for LedgersPerFile: 1000 with a smaller buffer size did not result in better performance.

  • Buffer size: 2, Num workers: 1, Time taken: 23m 58s
  • Buffer size: 2, Num workers: 2, Time taken: 23m 52s

@urvisavla
Copy link
Contributor Author

Following Tamir's advice, I moved the tests to a dev EC2 instance. The performance was much worse: a test that takes 30 minutes locally for 10,000 ledgers took over 7 hours on the EC2. We found that I/O was the problem so as per Ops suggestion we upgraded to a larger gp3 volume and I moved PostgreSQL to use this new volume. It improved the time to about 1 hour for smaller buffer sizes but larger buffers led to oom errors.

I then modified the tests to use the BufferedStorageBackend directly to download and extract ledgers with varying buffer sizes and parallel workers, without reingesting through Horizon (and thus without PostgreSQL). This cut the processing time to about 20 minutes for 10,000 ledgers, but it’s still much slower compared to 3 minutes locally. This suggests the dev EC2 (t2.medium) are too small for these tests.

I’ve also run the same BufferedStorageBackend tests locally with different settings. The results match those from the previous Horizon-based tests. I’ll review all the data and provide a summary with configuration recommendations.

@urvisavla
Copy link
Contributor Author

We've added the recommended config values for BufferedStorageBackend in this PR https://github.com/stellar/go/pull/5462/files#diff-052e43536000d43e8d99a272431cf8b323816b05bf59b7501392f2313276c763R29.

@sreuland
Copy link
Contributor

sreuland commented Oct 1, 2024

The guidelines can be doc'd as part of this request for new dev docs related to CDP - stellar/stellar-docs#1012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants