Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds --no-skip-initial-accounts-db-clean *hidden* CLI flag #33664

Merged
merged 2 commits into from
Oct 12, 2023

Conversation

brooksprumo
Copy link
Contributor

@brooksprumo brooksprumo commented Oct 11, 2023

Problem

Some nodes with 128 GB of RAM are OOMing when upgrading to v1.16.

The issue happens at startup, while the first clean runs. Since the first clean takes a while on these nodes, flush cannot run, and thus the accounts write cache balloons until OOM. The accounts write cache fills up because we are processing transactions.

In older version of the validator, we used to always clean before processing transactions. If we had that behavior again, then the OOM could be avoided.

Discord debug channel for more info: https://discord.com/channels/428295358100013066/1156647974563233963

Summary of Changes

Add a hidden CLI flag that forces clean to run before we begin processing transactions.

@brooksprumo brooksprumo self-assigned this Oct 11, 2023
@brooksprumo
Copy link
Contributor Author

@t-nelson Do you want to weigh in here on the CLI parts? Naming/etc. If there are desired changes, it's nice to iterate on those quickly, so not to waste a bunch of time waiting on CI. I'm planning on backporting this to v1.17 and v1.16.

@brooksprumo brooksprumo marked this pull request as ready for review October 11, 2023 19:19
@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Merging #33664 (6e20193) into master (fa21a3d) will decrease coverage by 0.1%.
Report is 8 commits behind head on master.
The diff coverage is 44.4%.

@@            Coverage Diff            @@
##           master   #33664     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         806      806             
  Lines      217477   217484      +7     
=========================================
- Hits       178026   177995     -31     
- Misses      39451    39489     +38     

@t-nelson
Copy link
Contributor

i kinda like that nearly all of our current "suspect" flags match --no-*, so something like --no-skip-initial-accounts-db-clean would fit that scheme better. ideally we aren't really supporting this long term

@brooksprumo
Copy link
Contributor Author

something like --no-skip-initial-accounts-db-clean

Please no double negatives 🥺

@t-nelson
Copy link
Contributor

dunno that i'd call "skip" a negative. it's omitting work, ie optimization, ie faster, ie better, ie positive, if anything!

i mainly want a "oh one of these operators" signal in the args list. great short circuit when doing support

@brooksprumo
Copy link
Contributor Author

i mainly want a "oh one of these operators" signal in the args list. great short circuit when doing support

Ok, sounds good. Done in 6e20193.

@brooksprumo brooksprumo changed the title Adds --accounts-db-force-initial-clean *hidden* CLI flag Adds --no-skip-initial-accounts-db-clean *hidden* CLI flag Oct 12, 2023
@brooksprumo brooksprumo added v1.16 PRs that should be backported to v1.16 v1.17 PRs that should be backported to v1.17 labels Oct 12, 2023
Copy link
Contributor

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brooksprumo
Copy link
Contributor Author

@t-nelson Do you want to re-review before I merge?

@brooksprumo brooksprumo merged commit 452fd5d into solana-labs:master Oct 12, 2023
31 checks passed
@brooksprumo brooksprumo deleted the force-initial-clean branch October 12, 2023 17:32
mergify bot pushed a commit that referenced this pull request Oct 12, 2023
(cherry picked from commit 452fd5d)

# Conflicts:
#	runtime/src/bank/serde_snapshot.rs
#	runtime/src/snapshot_bank_utils.rs
mergify bot pushed a commit that referenced this pull request Oct 12, 2023
@t-nelson
Copy link
Contributor

thanks!

brooksprumo added a commit that referenced this pull request Oct 12, 2023
…backport of #33664) (#33676)

* Adds `--no-skip-initial-accounts-db-clean` *hidden* CLI flag (#33664)

(cherry picked from commit 452fd5d)

# Conflicts:
#	runtime/src/bank/serde_snapshot.rs
#	runtime/src/snapshot_bank_utils.rs

* fix backport conflicts/issues

---------

Co-authored-by: Brooks <[email protected]>
brooksprumo added a commit that referenced this pull request Oct 12, 2023
…backport of #33664) (#33677)

Adds `--no-skip-initial-accounts-db-clean` *hidden* CLI flag (#33664)

(cherry picked from commit 452fd5d)

Co-authored-by: Brooks <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.16 PRs that should be backported to v1.16 v1.17 PRs that should be backported to v1.17
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants