-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recovery scenario #1024
recovery scenario #1024
Conversation
cmd/algorand-indexer/daemon.go
Outdated
@@ -133,6 +136,19 @@ var daemonCmd = &cobra.Command{ | |||
fmt.Fprint(os.Stderr, "missing indexer data directory") | |||
os.Exit(1) | |||
} | |||
|
|||
// sync local ledger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required now, but I think we'll need to figure out a way to put this in an Init function thats part of the block processor.
There is also an edge case we could check to fail fast: if the catchpoint is ahead of nextDBRound we should avoid fast catchup, and probably inform the user that they might want to finish catchup with an earlier version of Indexer.
Codecov Report
@@ Coverage Diff @@
## localledger/integration #1024 +/- ##
==========================================================
Coverage ? 57.98%
==========================================================
Files ? 48
Lines ? 8906
Branches ? 0
==========================================================
Hits ? 5164
Misses ? 3255
Partials ? 487 Continue to review full report at Codecov.
|
initState, err := util.CreateInitState(genesis, genesisBlock) | ||
if err != nil { | ||
return nil, fmt.Errorf("MakeProcessor() err: %w", err) | ||
} | ||
l, err := ledger.OpenLedger(logging.NewLogger(), filepath.Join(path.Dir(datadir), "ledger"), false, initState, algodConfig.GetDefaultLocal()) | ||
if dbRound != 0 && !ledgerExists(datadir, prefix) { | ||
msg := fmt.Sprintf("%s\n%s\n%s\n%s\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the added friction of causing an error here worth it? Why not just create the files in sequential migration mode and print a warning to the user saying that there are faster alternatives?
"ledger.block.sqlite-wal", | ||
"ledger.tracker.sqlite", | ||
"ledger.tracker.sqlite-shm", | ||
"ledger.tracker.sqlite-wal", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shm and wal files are also needed or openLedger in block processor runs into disk I/O
error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you still seeing an I/O error? I wasn't able to reproduce this
nextDBRound, err := db.GetNextRoundToAccount() | ||
maybeFail(err, "Error getting DB round") | ||
if nextDBRound > 0 { | ||
if catchpoint != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another edge case where the catchpoint is > nextDBRound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a check for this in processor,
indexer/processor/blockprocessor/block_processor.go
Lines 52 to 54 in d057e1c
if uint64(l.Latest()) > dbRound { | |
return nil, fmt.Errorf("MakeProcessor() err: the ledger cache is ahead of the required round and must be re-initialized") | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the opposite check, we need to avoid catching up if the catchpoint if ahead of the desired round (i.e. if you're starting a new node and provide a catchpoint, we shouldn't initialize anything)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can merge this in and have a followup handling the catchpoint > next-round case
* Local Ledger (#1011) * integrate block processor * Local Ledger Deployment (#1013) * add simple local ledger migration * add deleted opts * fast catchup (#1023) * add fast catchup * Localledger merge (#1036) * return empty lists from fetchApplications and fetchAppLocalStates (#1010) * Update model to converge with algod (#1005) * New Feature: Adds Data Directory Support (#1012) - Updates the go-algorand submodule hash to point to rel/beta - Moves the cpu profiling file, pid file and indexer configuration file to be options of only the daemon sub-command - Changes os.Exit() to be a panic with a special handler. This is so that defer's are handled instead of being ignored. - Detects auto-loading configuration files in the data directory and issues errors if equivalent command line arguments are supplied. - Updates the README with instructions on how to use the auto-loading configuration files and the data directory. * Update mockery version Co-authored-by: erer1243 <[email protected]> Co-authored-by: AlgoStephenAkiki <[email protected]> * recovery scenario (#1024) * handle ledger recovery scenario * refactor create genesis block (#1026) * refactor create genesis block * Adds Local Ledger Readme (#1035) * Adds Local Ledger Readme Resolves #4109 Starts Readme docs * Update docs/LocalLedger.md Co-authored-by: Will Winder <[email protected]> * Update docs/LocalLedger.md Co-authored-by: Will Winder <[email protected]> * Update docs/LocalLedger.md Co-authored-by: Will Winder <[email protected]> * Removed troubleshooting section Co-authored-by: Will Winder <[email protected]> * update ledger file path and migration (#1042) * LocalLedger Refactoring + Catchpoint Service (#1049) Part 1 cleanup genesis file access. put node catchup into a function that can be swapped out with the catchup service. pass the indexer logger into the block processor. move open ledger into a util function, and move the initial state util function into a new ledger util file. add initial catchupservice implementation. move ledger init from daemon.go to constructor. Merge multiple read genesis functions. Part 2 Merge local_ledger migration package into blockprocessor. Rename Migration to Initialize Use logger in catchup service catchup Part 3 Update submodule and use NewWrappedLogger. Make util.CreateInitState private * build: merge develop into localledger/integration (#1062) * Ledger init status (#1058) * Generate an error if the catchpoint is not valid for initialization. (#1075) * Use main logger in handler and fetcher. (#1077) * Switch from fullNode catchup to catchpoint catchup service. (#1076) * Refactor daemon, add more tests (#1039) Refactors daemon cmd into separate, testable pieces. * Merge develop into localledger/integration (#1083) * Misc Local Ledger cleanup (#1086) * Update processor/blockprocessor/initialize.go Co-authored-by: Zeph Grunschlag <[email protected]> * commit * fix function call args * RFC-0001: Rfc 0001 impl (#1069) Adds an Exporter interface and a noop exporter implementation with factory methods for construction * Fix test errors * Add/fix tests * Add postgresql_exporter tests * Update config loading * Change BlockExportData to pointers * Move and rename ExportData * Add Empty func to BlockData * Add comment Co-authored-by: shiqizng <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: erer1243 <[email protected]> Co-authored-by: AlgoStephenAkiki <[email protected]> Co-authored-by: Will Winder <[email protected]> Co-authored-by: Zeph Grunschlag <[email protected]>
Summary
This PR implements a recovery mechanism for ledger. If the local disk fails and we lose the data directory, the indexer will reinitialize the ledger.
Test Plan
run daemon manually.