Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: fix unit test TestAsyncTelemetryHook_QueueDepth #2685

Merged
merged 2 commits into from
Jan 7, 2022

Conversation

tsachiherman
Copy link
Contributor

Summary

Fix the unit test TestAsyncTelemetryHook_QueueDepth:

	// the anonymous goroutine in createAsyncHookLevels might pull an entry off the pending list before
	// writing it off to the underlying hook. when that happens, the total number of sent entries could
	// be one higher then the maxDepth.

Test Plan

This is a test.

// the anonymous goroutine in createAsyncHookLevels might pull an entry off the pending list before
// writing it off to the underlying hook. when that happens, the total number of sent entries could
// be one higher then the maxDepth.
require.LessOrEqual(t, hookEntries, maxDepth+1)
Copy link
Contributor

@winder winder Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that moving hook.ready = true after the Fire calls would remove the race condition. After close(filling) would work too but I don't think it's required.

Actually, the filling channel doesn't seem to be needed at all.

This logging code is pretty hard to follow, but I think this is the main loop:

		for !exit {
			exit = !hook.waitForEventAndReady()

			hasEvents := true
			for hasEvents {
				select {
				case entry := <-hook.entries:
					hook.appendEntry(entry)
				default:
					hook.Lock()
					var entry *logrus.Entry
---->					if len(hook.pending) > 0 && hook.ready { // entry is always nil when ready = false
						entry = hook.pending[0]
						hook.pending = hook.pending[1:]
					}
					hook.Unlock()
					if entry != nil {
---->						err := hook.wrappedHook.Fire(entry) // therefore the mock will not be triggered
						if err != nil {
							Base().Warnf("Unable to write event %#v to telemetry : %v", entry, err)
						}
						hook.wg.Done()
					} else {
						hasEvents = false
					}
				}
			}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the code is a bit spaghetti like; but I don't believe that changing hook.ready would help:
The variable is not being blocked upon. waitForEventAndReady doesn't wait for the ready flag, but rather just return the ready flag state.

The filling variable helps to ensure we block the err := hook.wrappedHook.Fire(entry) call, so that we can fill up the hook.entries channel. Then, once the channel is full, we unblock the Fire call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry - I misread your response. I believe that depending on the scenario we want to execute, there are more than one way to fix the test. My plan was to change the expectation, as the test current doing the following:

  1. write the first entry to the hook.
  2. the Fire is writing to the entries; entries is read and calling the inner Fire. The inner Fire is being block.
  3. the rest of the entries are being written, leaving 10 entries in the pending array.
  4. The inner Fire is getting unblocked.
  5. The anonymous method finish iterating over the remaining entries.

The total number of entries is either 10 or 11, depending on whether the #2 was executed before all the entries were written.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name is TestAsyncTelemetryHook_QueueDepth, so I believe moving the ready location forces the surplus events to be discarded according to the QueueDepth. So it seems like the right way to go.

TestAsyncTelemetryHook_CloseDrop also uses the ready flag, I don't think it tests "CloseDrop" at all.

The complexity of this code is pretty absurd, maybe we can revive the "Externalized Telemetry" project and replace it entirely with a better design.

@codecov-commenter
Copy link

Codecov Report

Merging #2685 (3a1085f) into master (dbab85f) will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2685      +/-   ##
==========================================
+ Coverage   47.29%   47.30%   +0.01%     
==========================================
  Files         356      356              
  Lines       57240    57240              
==========================================
+ Hits        27071    27078       +7     
+ Misses      27103    27100       -3     
+ Partials     3066     3062       -4     
Impacted Files Coverage Δ
cmd/algoh/blockWatcher.go 77.77% <0.00%> (-3.18%) ⬇️
ledger/blockqueue.go 82.18% <0.00%> (-2.88%) ⬇️
catchup/service.go 68.57% <0.00%> (-1.56%) ⬇️
network/wsNetwork.go 60.90% <0.00%> (-0.19%) ⬇️
data/transactions/verify/txn.go 44.29% <0.00%> (ø)
cmd/tealdbg/debugger.go 73.86% <0.00%> (+1.00%) ⬆️
crypto/merkletrie/node.go 93.48% <0.00%> (+1.86%) ⬆️
agreement/proposalManager.go 98.03% <0.00%> (+1.96%) ⬆️
crypto/merkletrie/trie.go 68.61% <0.00%> (+2.18%) ⬆️
agreement/cryptoVerifier.go 77.94% <0.00%> (+2.20%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbab85f...3a1085f. Read the comment docs.

Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stephen found a data race related to modifying the telemetry hook while entries were queued. Since there are plans to redo telemetry at some point, I don't think we should spend time investigating that race.

@tsachiherman
Copy link
Contributor Author

We shouldn't have any data races.. but I don't mind holding off with fixing the data race if it doesn't happen on production.

@tsachiherman tsachiherman merged commit 46784c7 into algorand:master Jan 7, 2022
@tsachiherman tsachiherman deleted the tsachi/fixqueuedepthtest branch January 7, 2022 15:51
jannotti pushed a commit that referenced this pull request Jan 13, 2022
* testing: Fix unit test TestAsyncTelemetryHook_QueueDepth (#2685)

Fix the unit test TestAsyncTelemetryHook_QueueDepth

* Deprecate `FastPartitionRecovery` from `ConsensusParams` (#3386)

## Summary

This PR removes `FastPartitionRecovery` option from consensus parameters. The code now acts as if this value is set to true.

Closes https://github.com/algorand/go-algorand-internal/issues/1830.

## Test Plan

None.

* Remaking a PR for CI (#3398)

* Allow setting manager, reserve, freeze, and clawback at goal asset create

* Add e2e tests

* Add more tests for goal asset create flags

Co-authored-by: Fionna <[email protected]>

* [Other] CircleCI pipeline change for binary uploads (#3381)

For nightly builds ("rel/nightly"), we want to have deadlock enabled.

For rel/beta and rel/stable, we want to make sure we can build and upload a binary with deadlock disabled so that it can be used for release testing and validation purposes.

* signer.KeyDilution need not depend on config package (#3265)

crypto package need not depend on config.
There is an unnecessary dependency on config.

signer.KeyDilution takes the `config.ConsensusParams` as argument to pick the DefaultKeyDilution from it. 
This introduces dependency from the crypto package to config package. 
Instead, only the DefaultKeyDilution value can be passed to signer.KeyDilution.

* algodump is a tcpdump-like tool for algod's network protocol (#3166)

This PR introduces algodump, a tcpdump-like tool for monitoring algod network messages.

* Removing C/crypto dependencies from `data/abi` package (#3375)

* Feature Networks pipeline related changes (#3393)

Added support for not having certain files in signing script

Co-authored-by: Tsachi Herman <[email protected]>
Co-authored-by: Tolik Zinovyev <[email protected]>
Co-authored-by: Jack <[email protected]>
Co-authored-by: Fionna <[email protected]>
Co-authored-by: algobarb <[email protected]>
Co-authored-by: Shant Karakashian <[email protected]>
Co-authored-by: Nickolai Zeldovich <[email protected]>
Co-authored-by: Hang Su <[email protected]>
jannotti added a commit that referenced this pull request Jan 18, 2022
* Three new globals for to help contract-to-contract usability

* detritis

* Check error

* doc comments

* Impose limits on the entire "tree" of inner calls.

This also increases the realism of testing of multiple app calls in a
group by creating the EvalParams with the real constructor, thus
getting the pooling stuff tested here without playing games
manipulating the ep after construction.

* Move appID tracking into EvalContext, out of LedgerForLogic

This change increases the seperation between AVM execution and the
ledger being used to lookup resources.  Previously, the ledger kept
track of the appID being executed, to offer a narrower interface to
those resources. But now, with app-to-app calls, the appID being
executed must change, and the AVM needs to maintain the current appID.

* Stupid linter

* Fix unit tests error messages

* Allow access to resources created in the same transaction group

The method will be reworked, but the tests are correct and want to get
them visible to team.

* Access to apps created in group

Also adds some tests that are currently skipped for testing
- access to addresses of newly created apps
- use of gaid in inner transactions

Both require some work to implement the thing being tested.

* Remove tracked created mechanism in favor of examining applydata.

* Allow v6 AVM code to use in-group created asas, apps (& their accts)

One exception - apps can not mutate (put or del) keys from the app
accounts, because EvalDelta cannot encode such changes.

* lint docs

* typo

* The review dog needs obedience training.

* Use one EvalParams for logic evals, another for apps in dry run

We used to use one ep per transaction, shared between sig and and
app. But the new model of ep usage is to keep using one while
evaluating an entire group.

The app ep is now built logic.NewAppEvalParams which, hopefully, will
prevent some bugs when we change something in the EvalParams and don't
reflect it in what was a "raw" EvalParams construction in debugger and
dry run.

* Use logic.NewAppEvalParams to decrease copying and bugs in debugger

* Simplify use of NewEvalParams. No more nil return when no apps.

This way, NewEvalParams can be used for all creations of EvalParams,
whether they are intended for logicsig or app use, greatly simplifying
the way we make them for use by dry run or debugger (where they serve
double duty).

* Remove explicit PastSideEffects handling in tealdbg

* Always create EvalParams to evaluate a transaction group.

We used to have an optimization to avoid creating EvalParams unless
there was an app call in the transaction group.  But the interface to
allow transaction processing to communicate changes into the
EvalParams is complicated by that (we must only do it if there is
one!)

This also allows us to use the same construction function for eps
created for app and logic evaluation, simplifying dry-run and
debugger.

The optimization is less needed now anyway:
1) The ep is now shared for the whole group, so it's only one.
2) The ep is smaller now, as we only store nil pointers instead of
larger scratch space objects for non-app calls.

* Correct mistaken commit

* Spec improvments

* More spec improvments, including resource "availability"

* Recursively return inner transaction tree

* Lint

* No need for ConfirmedRound, so don't deref a nil pointer!

* license check

* Shut up, dawg.

* testing: Fix unit test TestAsyncTelemetryHook_QueueDepth (#2685)

Fix the unit test TestAsyncTelemetryHook_QueueDepth

* Deprecate `FastPartitionRecovery` from `ConsensusParams` (#3386)

## Summary

This PR removes `FastPartitionRecovery` option from consensus parameters. The code now acts as if this value is set to true.

Closes algorand/go-algorand-internal#1830.

## Test Plan

None.

* base64 merge cleanup

* Remaking a PR for CI (#3398)

* Allow setting manager, reserve, freeze, and clawback at goal asset create

* Add e2e tests

* Add more tests for goal asset create flags

Co-authored-by: Fionna <[email protected]>

* Remove the extraneous field type arrays.

* bsqrt

* acct_holding_get, a unified opcode for account field access

* Thanks, dawg

* [Other] CircleCI pipeline change for binary uploads (#3381)

For nightly builds ("rel/nightly"), we want to have deadlock enabled.

For rel/beta and rel/stable, we want to make sure we can build and upload a binary with deadlock disabled so that it can be used for release testing and validation purposes.

* signer.KeyDilution need not depend on config package (#3265)

crypto package need not depend on config.
There is an unnecessary dependency on config.

signer.KeyDilution takes the `config.ConsensusParams` as argument to pick the DefaultKeyDilution from it. 
This introduces dependency from the crypto package to config package. 
Instead, only the DefaultKeyDilution value can be passed to signer.KeyDilution.

* CR and more spec simplification

* algodump is a tcpdump-like tool for algod's network protocol (#3166)

This PR introduces algodump, a tcpdump-like tool for monitoring algod network messages.

* Removing C/crypto dependencies from `data/abi` package (#3375)

* Feature Networks pipeline related changes (#3393)

Added support for not having certain files in signing script

* e2e test for inner transaction appls

* testing: Add slightly more coverage to TestAcctUpdatesLookupRetry (#3384)

Add slightly more coverage to TestAcctUpdatesLookupRetry

* add context to (most) agreement logged writes (#3411)

Current agreement code only writes a `context : agreement` to a subset of the logged messages.
This change extends the said entry, which would make it easier to pre-process logs entries by their corresponding component. The change in this PR is focused on:
1. make sure that the "root" agreement logger always injects the `context : agreement` argument.
2. change the various locations in the agreement code to use the root agreement logger instead of referring to the application-global instance (`logging.Base()`).

* network: faster node shutdown (#3416)

During the node shutdown, all the current outgoing connections are being disconnected.
Since these connections are web sockets, they require a close connection message to be sent.
However, sending this message can take a while, and in situations where the other party has already shut down, we might never get a response. That, in turn, would lead the node waiting until the deadline is reached.

The current deadline was 5 seconds. This PR changes the deadline during shutdown to be 50ms.

* Give max group size * 16 inner txns, regardless of apps present

* Adjust test for allowing 256 inners

Co-authored-by: Tsachi Herman <[email protected]>
Co-authored-by: Tolik Zinovyev <[email protected]>
Co-authored-by: Jack <[email protected]>
Co-authored-by: Fionna <[email protected]>
Co-authored-by: algobarb <[email protected]>
Co-authored-by: Shant Karakashian <[email protected]>
Co-authored-by: Nickolai Zeldovich <[email protected]>
Co-authored-by: Hang Su <[email protected]>
Co-authored-by: chris erway <[email protected]>
@algobarb algobarb mentioned this pull request Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants