Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algod Importer: Longer Timeout #133

Merged
merged 16 commits into from
Aug 11, 2023

Conversation

tzaffi
Copy link
Contributor

@tzaffi tzaffi commented Aug 9, 2023

Change waitForRoundTimeout to 30 from 5

  1. changing the amount of time to wait for the next round
  2. adding more context to the SyncError type

Explanation for the timeout change

During a recent EC2 import test, I noticed that after catching up, around 1% of imports are reporting an error/warning:

{
"__type":"importer",
"_name":"algod",
"level":"warning",
"msg":"Sync error detected, attempting to set the sync round to recover the node: wrong round returned from status for round: 0 != 31117945",
"time":"2023-08-07T15:31:16.166545875Z"
}
{
"__type":"Conduit",
"_name":"main",
"level":"error",
"msg":"wrong round returned from status for round: 0 != 31117945",
"time":"2023-08-07T15:31:16.166783576Z"
}

The pipelining branch, which added some more information to the SyncError errored similarly with the warning msg:

importer algod.GetBlock() sync error detected, attempting to set the sync round to recover the node: wrong round returned from status for round: 31112472 != 31112473: status2.LastRound mismatch: context deadline exceeded

So we can see that we're getting a timeout error. This is to be expected on occasion as when calling algod's StatusAfterBlock(), a context is provided with a 5 sec timeout.

Having carried out some statistical analysis on the last 1 million rounds, I got the following results:

mean_duration stdev_duration min_duration max_duration 0-4 5-9 10-14 15-19 20-24
3.400925599 0.507927656 0 22 999803 148 0 0 50

So if we re-set waitForRoundTimeout = 30, we're likely to encounter 0 such timeouts, or at most a small number.

On the other hand, algod's own timeout is 60 secs so we are steering quite far from the upper bound.

@codecov
Copy link

codecov bot commented Aug 9, 2023

Codecov Report

Merging #133 (b9fd399) into master (442791a) will increase coverage by 2.86%.
Report is 48 commits behind head on master.
The diff coverage is 77.77%.

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
+ Coverage   67.66%   70.53%   +2.86%     
==========================================
  Files          32       36       +4     
  Lines        1976     2545     +569     
==========================================
+ Hits         1337     1795     +458     
- Misses        570      653      +83     
- Partials       69       97      +28     
Files Changed Coverage Δ
conduit/data/block_export_data.go 100.00% <ø> (+92.30%) ⬆️
conduit/metrics/metrics.go 100.00% <ø> (ø)
conduit/pipeline/metadata.go 69.11% <ø> (ø)
conduit/plugins/config.go 100.00% <ø> (ø)
...duit/plugins/exporters/filewriter/file_exporter.go 81.63% <ø> (-1.06%) ⬇️
conduit/plugins/importers/algod/metrics.go 100.00% <ø> (ø)
...gins/processors/filterprocessor/fields/searcher.go 77.50% <ø> (ø)
...ins/processors/filterprocessor/filter_processor.go 83.82% <ø> (+3.54%) ⬆️
...plugins/processors/filterprocessor/gen/generate.go 34.28% <ø> (ø)
conduit/plugins/processors/noop/noop_processor.go 64.70% <ø> (+6.81%) ⬆️
... and 19 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tzaffi tzaffi marked this pull request as ready for review August 9, 2023 19:52
}

func (e *SyncError) Error() string {
return fmt.Sprintf("wrong round returned from status for round: %d != %d", e.rnd, e.expected)
return fmt.Sprintf("wrong round returned from status for round: retrieved(%d) != expected(%d): %v", e.retrievedRound, e.expectedRound, e.err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shiqizng - as you pointed out yesterday, at face value it's hard to tell which round was which so I've added "retrieved" for the value gotten back from the endpoint vs. "expected" which is the importer's expectation.

@tzaffi tzaffi requested a review from a team August 9, 2023 19:57
Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just had a couple of nits.

}
noError = noError && assert.True(t, found, "(%s) Expected log was not found: '%s'", tc.name, log)
if !noError {
fmt.Printf(">>>>>WE HAVE A PROBLEM<<<<<<\n")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a little easier to find the failing needle in the printouts haystack since we aren't using require assertions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use !found as the condition, in tests that provide multiple log entries would print WE HAVE A PROBLEM for all checks after the first one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzaffi tzaffi mentioned this pull request Aug 10, 2023
10 tasks
Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one suggestion for the WE HAVE A PROBLEM condition.

@tzaffi tzaffi merged commit 7ef6106 into algorand:master Aug 11, 2023
3 checks passed
@tzaffi tzaffi deleted the algod-importer-longer-timeout branch August 11, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants