testing: fix unit test TestAsyncTelemetryHook_QueueDepth #2685

tsachiherman · 2021-08-04T13:59:01Z

Summary

Fix the unit test TestAsyncTelemetryHook_QueueDepth:

	// the anonymous goroutine in createAsyncHookLevels might pull an entry off the pending list before
	// writing it off to the underlying hook. when that happens, the total number of sent entries could
	// be one higher then the maxDepth.

Test Plan

This is a test.

winder · 2021-08-04T14:21:44Z

logging/telemetryhook_test.go

+	// the anonymous goroutine in createAsyncHookLevels might pull an entry off the pending list before
+	// writing it off to the underlying hook. when that happens, the total number of sent entries could
+	// be one higher then the maxDepth.
+	require.LessOrEqual(t, hookEntries, maxDepth+1)


I think that moving hook.ready = true after the Fire calls would remove the race condition. After close(filling) would work too but I don't think it's required.

Actually, the filling channel doesn't seem to be needed at all.

This logging code is pretty hard to follow, but I think this is the main loop:

for !exit { exit = !hook.waitForEventAndReady() hasEvents := true for hasEvents { select { case entry := <-hook.entries: hook.appendEntry(entry) default: hook.Lock() var entry *logrus.Entry ----> if len(hook.pending) > 0 && hook.ready { // entry is always nil when ready = false entry = hook.pending[0] hook.pending = hook.pending[1:] } hook.Unlock() if entry != nil { ----> err := hook.wrappedHook.Fire(entry) // therefore the mock will not be triggered if err != nil { Base().Warnf("Unable to write event %#v to telemetry : %v", entry, err) } hook.wg.Done() } else { hasEvents = false } } }

I agree that the code is a bit spaghetti like; but I don't believe that changing hook.ready would help:
The variable is not being blocked upon. waitForEventAndReady doesn't wait for the ready flag, but rather just return the ready flag state.

The filling variable helps to ensure we block the err := hook.wrappedHook.Fire(entry) call, so that we can fill up the hook.entries channel. Then, once the channel is full, we unblock the Fire call.

sorry - I misread your response. I believe that depending on the scenario we want to execute, there are more than one way to fix the test. My plan was to change the expectation, as the test current doing the following:

write the first entry to the hook.

the Fire is writing to the entries; entries is read and calling the inner Fire. The inner Fire is being block.

the rest of the entries are being written, leaving 10 entries in the pending array.

The inner Fire is getting unblocked.

The anonymous method finish iterating over the remaining entries.

The total number of entries is either 10 or 11, depending on whether the #2 was executed before all the entries were written.

The test name is TestAsyncTelemetryHook_QueueDepth, so I believe moving the ready location forces the surplus events to be discarded according to the QueueDepth. So it seems like the right way to go.

TestAsyncTelemetryHook_CloseDrop also uses the ready flag, I don't think it tests "CloseDrop" at all.

The complexity of this code is pretty absurd, maybe we can revive the "Externalized Telemetry" project and replace it entirely with a better design.

codecov-commenter · 2021-09-20T18:18:02Z

Codecov Report

Merging #2685 (3a1085f) into master (dbab85f) will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2685      +/-   ##
==========================================
+ Coverage   47.29%   47.30%   +0.01%     
==========================================
  Files         356      356              
  Lines       57240    57240              
==========================================
+ Hits        27071    27078       +7     
+ Misses      27103    27100       -3     
+ Partials     3066     3062       -4

Impacted Files	Coverage Δ
cmd/algoh/blockWatcher.go	`77.77% <0.00%> (-3.18%)`	⬇️
ledger/blockqueue.go	`82.18% <0.00%> (-2.88%)`	⬇️
catchup/service.go	`68.57% <0.00%> (-1.56%)`	⬇️
network/wsNetwork.go	`60.90% <0.00%> (-0.19%)`	⬇️
data/transactions/verify/txn.go	`44.29% <0.00%> (ø)`
cmd/tealdbg/debugger.go	`73.86% <0.00%> (+1.00%)`	⬆️
crypto/merkletrie/node.go	`93.48% <0.00%> (+1.86%)`	⬆️
agreement/proposalManager.go	`98.03% <0.00%> (+1.96%)`	⬆️
crypto/merkletrie/trie.go	`68.61% <0.00%> (+2.18%)`	⬆️
agreement/cryptoVerifier.go	`77.94% <0.00%> (+2.20%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbab85f...3a1085f. Read the comment docs.

winder

Stephen found a data race related to modifying the telemetry hook while entries were queued. Since there are plans to redo telemetry at some point, I don't think we should spend time investigating that race.

tsachiherman · 2022-01-07T15:50:39Z

We shouldn't have any data races.. but I don't mind holding off with fixing the data race if it doesn't happen on production.

* testing: Fix unit test TestAsyncTelemetryHook_QueueDepth (#2685) Fix the unit test TestAsyncTelemetryHook_QueueDepth * Deprecate `FastPartitionRecovery` from `ConsensusParams` (#3386) ## Summary This PR removes `FastPartitionRecovery` option from consensus parameters. The code now acts as if this value is set to true. Closes https://github.com/algorand/go-algorand-internal/issues/1830. ## Test Plan None. * Remaking a PR for CI (#3398) * Allow setting manager, reserve, freeze, and clawback at goal asset create * Add e2e tests * Add more tests for goal asset create flags Co-authored-by: Fionna <[email protected]> * [Other] CircleCI pipeline change for binary uploads (#3381) For nightly builds ("rel/nightly"), we want to have deadlock enabled. For rel/beta and rel/stable, we want to make sure we can build and upload a binary with deadlock disabled so that it can be used for release testing and validation purposes. * signer.KeyDilution need not depend on config package (#3265) crypto package need not depend on config. There is an unnecessary dependency on config. signer.KeyDilution takes the `config.ConsensusParams` as argument to pick the DefaultKeyDilution from it. This introduces dependency from the crypto package to config package. Instead, only the DefaultKeyDilution value can be passed to signer.KeyDilution. * algodump is a tcpdump-like tool for algod's network protocol (#3166) This PR introduces algodump, a tcpdump-like tool for monitoring algod network messages. * Removing C/crypto dependencies from `data/abi` package (#3375) * Feature Networks pipeline related changes (#3393) Added support for not having certain files in signing script Co-authored-by: Tsachi Herman <[email protected]> Co-authored-by: Tolik Zinovyev <[email protected]> Co-authored-by: Jack <[email protected]> Co-authored-by: Fionna <[email protected]> Co-authored-by: algobarb <[email protected]> Co-authored-by: Shant Karakashian <[email protected]> Co-authored-by: Nickolai Zeldovich <[email protected]> Co-authored-by: Hang Su <[email protected]>

* Three new globals for to help contract-to-contract usability * detritis * Check error * doc comments * Impose limits on the entire "tree" of inner calls. This also increases the realism of testing of multiple app calls in a group by creating the EvalParams with the real constructor, thus getting the pooling stuff tested here without playing games manipulating the ep after construction. * Move appID tracking into EvalContext, out of LedgerForLogic This change increases the seperation between AVM execution and the ledger being used to lookup resources. Previously, the ledger kept track of the appID being executed, to offer a narrower interface to those resources. But now, with app-to-app calls, the appID being executed must change, and the AVM needs to maintain the current appID. * Stupid linter * Fix unit tests error messages * Allow access to resources created in the same transaction group The method will be reworked, but the tests are correct and want to get them visible to team. * Access to apps created in group Also adds some tests that are currently skipped for testing - access to addresses of newly created apps - use of gaid in inner transactions Both require some work to implement the thing being tested. * Remove tracked created mechanism in favor of examining applydata. * Allow v6 AVM code to use in-group created asas, apps (& their accts) One exception - apps can not mutate (put or del) keys from the app accounts, because EvalDelta cannot encode such changes. * lint docs * typo * The review dog needs obedience training. * Use one EvalParams for logic evals, another for apps in dry run We used to use one ep per transaction, shared between sig and and app. But the new model of ep usage is to keep using one while evaluating an entire group. The app ep is now built logic.NewAppEvalParams which, hopefully, will prevent some bugs when we change something in the EvalParams and don't reflect it in what was a "raw" EvalParams construction in debugger and dry run. * Use logic.NewAppEvalParams to decrease copying and bugs in debugger * Simplify use of NewEvalParams. No more nil return when no apps. This way, NewEvalParams can be used for all creations of EvalParams, whether they are intended for logicsig or app use, greatly simplifying the way we make them for use by dry run or debugger (where they serve double duty). * Remove explicit PastSideEffects handling in tealdbg * Always create EvalParams to evaluate a transaction group. We used to have an optimization to avoid creating EvalParams unless there was an app call in the transaction group. But the interface to allow transaction processing to communicate changes into the EvalParams is complicated by that (we must only do it if there is one!) This also allows us to use the same construction function for eps created for app and logic evaluation, simplifying dry-run and debugger. The optimization is less needed now anyway: 1) The ep is now shared for the whole group, so it's only one. 2) The ep is smaller now, as we only store nil pointers instead of larger scratch space objects for non-app calls. * Correct mistaken commit * Spec improvments * More spec improvments, including resource "availability" * Recursively return inner transaction tree * Lint * No need for ConfirmedRound, so don't deref a nil pointer! * license check * Shut up, dawg. * testing: Fix unit test TestAsyncTelemetryHook_QueueDepth (#2685) Fix the unit test TestAsyncTelemetryHook_QueueDepth * Deprecate `FastPartitionRecovery` from `ConsensusParams` (#3386) ## Summary This PR removes `FastPartitionRecovery` option from consensus parameters. The code now acts as if this value is set to true. Closes algorand/go-algorand-internal#1830. ## Test Plan None. * base64 merge cleanup * Remaking a PR for CI (#3398) * Allow setting manager, reserve, freeze, and clawback at goal asset create * Add e2e tests * Add more tests for goal asset create flags Co-authored-by: Fionna <[email protected]> * Remove the extraneous field type arrays. * bsqrt * acct_holding_get, a unified opcode for account field access * Thanks, dawg * [Other] CircleCI pipeline change for binary uploads (#3381) For nightly builds ("rel/nightly"), we want to have deadlock enabled. For rel/beta and rel/stable, we want to make sure we can build and upload a binary with deadlock disabled so that it can be used for release testing and validation purposes. * signer.KeyDilution need not depend on config package (#3265) crypto package need not depend on config. There is an unnecessary dependency on config. signer.KeyDilution takes the `config.ConsensusParams` as argument to pick the DefaultKeyDilution from it. This introduces dependency from the crypto package to config package. Instead, only the DefaultKeyDilution value can be passed to signer.KeyDilution. * CR and more spec simplification * algodump is a tcpdump-like tool for algod's network protocol (#3166) This PR introduces algodump, a tcpdump-like tool for monitoring algod network messages. * Removing C/crypto dependencies from `data/abi` package (#3375) * Feature Networks pipeline related changes (#3393) Added support for not having certain files in signing script * e2e test for inner transaction appls * testing: Add slightly more coverage to TestAcctUpdatesLookupRetry (#3384) Add slightly more coverage to TestAcctUpdatesLookupRetry * add context to (most) agreement logged writes (#3411) Current agreement code only writes a `context : agreement` to a subset of the logged messages. This change extends the said entry, which would make it easier to pre-process logs entries by their corresponding component. The change in this PR is focused on: 1. make sure that the "root" agreement logger always injects the `context : agreement` argument. 2. change the various locations in the agreement code to use the root agreement logger instead of referring to the application-global instance (`logging.Base()`). * network: faster node shutdown (#3416) During the node shutdown, all the current outgoing connections are being disconnected. Since these connections are web sockets, they require a close connection message to be sent. However, sending this message can take a while, and in situations where the other party has already shut down, we might never get a response. That, in turn, would lead the node waiting until the deadline is reached. The current deadline was 5 seconds. This PR changes the deadline during shutdown to be 50ms. * Give max group size * 16 inner txns, regardless of apps present * Adjust test for allowing 256 inners Co-authored-by: Tsachi Herman <[email protected]> Co-authored-by: Tolik Zinovyev <[email protected]> Co-authored-by: Jack <[email protected]> Co-authored-by: Fionna <[email protected]> Co-authored-by: algobarb <[email protected]> Co-authored-by: Shant Karakashian <[email protected]> Co-authored-by: Nickolai Zeldovich <[email protected]> Co-authored-by: Hang Su <[email protected]> Co-authored-by: chris erway <[email protected]>

Fix unit test.

ad94331

tsachiherman added the Team Carbon-11 label Aug 4, 2021

tsachiherman requested a review from winder August 4, 2021 13:59

tsachiherman self-assigned this Aug 4, 2021

winder reviewed Aug 4, 2021

View reviewed changes

Merge branch 'master' into tsachi/fixqueuedepthtest

3a1085f

tsachizehub assigned AlgoStephenAkiki and unassigned tsachiherman Jan 6, 2022

algorandskiy mentioned this pull request Jan 6, 2022

Fix Flaky Async Telemetry Test #3382

Closed

winder approved these changes Jan 7, 2022

View reviewed changes

tsachiherman merged commit 46784c7 into algorand:master Jan 7, 2022

tsachiherman deleted the tsachi/fixqueuedepthtest branch January 7, 2022 15:51

algobarb mentioned this pull request Jan 18, 2022

go-algorand 3.3.0-beta #3423

Merged

Algo-devops-service mentioned this pull request Jan 20, 2022

Update reference to arm32 build for 3.3.0 release #3461

Merged

onetechnical mentioned this pull request Mar 1, 2022

go-algorand 3.4.2-stable #3700

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: fix unit test TestAsyncTelemetryHook_QueueDepth #2685

testing: fix unit test TestAsyncTelemetryHook_QueueDepth #2685

tsachiherman commented Aug 4, 2021

winder Aug 4, 2021 •

edited

Loading

tsachiherman Aug 4, 2021

tsachiherman Aug 4, 2021

winder Aug 4, 2021

codecov-commenter commented Sep 20, 2021

winder left a comment

tsachiherman commented Jan 7, 2022

testing: fix unit test TestAsyncTelemetryHook_QueueDepth #2685

testing: fix unit test TestAsyncTelemetryHook_QueueDepth #2685

Conversation

tsachiherman commented Aug 4, 2021

Summary

Test Plan

winder Aug 4, 2021 • edited Loading

Choose a reason for hiding this comment

tsachiherman Aug 4, 2021

Choose a reason for hiding this comment

tsachiherman Aug 4, 2021

Choose a reason for hiding this comment

winder Aug 4, 2021

Choose a reason for hiding this comment

codecov-commenter commented Sep 20, 2021

Codecov Report

winder left a comment

Choose a reason for hiding this comment

tsachiherman commented Jan 7, 2022

winder Aug 4, 2021 •

edited

Loading