Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: lint against require.Eventually (or assert.Eventually) #100796

Open
knz opened this issue Apr 6, 2023 · 7 comments
Open

testing: lint against require.Eventually (or assert.Eventually) #100796

knz opened this issue Apr 6, 2023 · 7 comments
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. quality-friday A good issue to work on on Quality Friday T-testeng TestEng Team

Comments

@knz
Copy link
Contributor

knz commented Apr 6, 2023

Describe the problem

There are two design problems in the require.Eventually / assert.Eventually code.

  1. the function calls its condition in a separate goroutine. So if the condition function tries to call t.Fail (via t.Error etc) this violates the API of *testing.T.
  2. it does not wait for the goroutine to complete before it returns. So if the condition doesn't return before the Eventually function gives up (e.g. because it gets stuck, or because it takes a longer time) and the Eventually logic fails, there will be a dangling goroutine.
  3. Then also if the condition function eventually runs t.Fail after the Eventually call completes and also the surrounding test, the t.Fail call will occur after the test has terminated which is also an API violation.

To Reproduce

Example code that demonstrates the problems:

Problem 1:

require.Eventually(t, func() bool {
    require.True(t, false) // invalid use of t.Fail in separate goroutine
}, 3*time.Second, time.Second)

Problem 2:

require.Eventually(t, func() bool {
    time.Sleep(5 *time.Second) // will cause goroutine to dangle and potentially trigger leaktest
}, 3*time.Second, time.Second)

Problem 3:

require.Eventually(t, func() bool {
    time.Sleep(5 *time.Second)
    require.True(t, false) // fail after the test terminates
}, 3*time.Second, time.Second)

Expected behavior

We should disallow any use of require.Eventually outright until the bug is fixed in the require/assert library.

testutils.SucceedsSoon doesn't have this problem.

Jira issue: CRDB-26629

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure T-testeng TestEng Team labels Apr 6, 2023
@blathers-crl
Copy link

blathers-crl bot commented Apr 6, 2023

cc @cockroachdb/test-eng

@srosenberg
Copy link
Member

I wrote a quick & dirty semgrep rule to find the violations. It gets the majority but not all. There are a few more that are nested inside closures, etc. I'll try ruleguard to get an exhaustive list.

NO_COLOR=true semgrep --verbose --config ~/.semgrep/rules/100796.yml pkg >~/100796_findings.txt

100796_findings.txt

knz added a commit to knz/cockroach that referenced this issue Apr 8, 2023
The test is flaky; sadly, the error does not reproduce locally and the
precise cause of the flakiness is obscured by cockroachdb#100796.

This commit re-enables the test with the fix for cockroachdb#100796, so that the
next CI failure can reveal more details about what went wrong.

Release note: None
@srosenberg
Copy link
Member

srosenberg commented Apr 9, 2023

Here is another quick pass, this time using revive. The custom linter just looks for any occurrence of (require|assert).XXX inside the anonymous closure passed into Eventually.

The run below also requires a patch in [1].

./revive -config ./test.toml $GOPATH/src/github.com/cockroachdb/cockroach/pkg/...  |sed -E "s|/Users/srosenberg/workspace/go/src/github.com/cockroachdb/cockroach/||g"

This should be pretty close to an exhaustive list for auditing purposes,

potentially unsafe use of require.Equal inside Eventually at pkg/kv/db_test.go:708:5
potentially unsafe use of require.NoError inside Eventually at pkg/ccl/sqlproxyccl/proxy_handler_test.go:1083:4
potentially unsafe use of require.NoError inside Eventually at pkg/kv/kvprober/kvprober_integration_test.go:277:4
potentially unsafe use of require.NoError inside Eventually at pkg/cmd/roachtest/tests/mixed_version_change_replicas.go:216:5
potentially unsafe use of require.Equal inside Eventually at pkg/server/server_test.go:1066:3
potentially unsafe use of require.Equal inside Eventually at pkg/server/admin_test.go:3172:5
potentially unsafe use of require.NoError inside Eventually at pkg/kv/kvserver/intent_resolver_integration_test.go:396:5
potentially unsafe use of require.NotNil inside Eventually at pkg/kv/kvserver/replica_rangefeed_test.go:1070:4
potentially unsafe use of require.NoError inside Eventually at pkg/kv/kvserver/replicate_queue_test.go:1666:4
potentially unsafe use of require.Equal inside Eventually at pkg/kv/kvserver/replica_lease_renewal_test.go:104:5

[1] mgechev/revive#812

@knz
Copy link
Contributor Author

knz commented Apr 9, 2023

Thanks for the linter.

NB: we should disallow Eventually regardless because of reason (2) - it leaks goroutines even if you don't use the assert/require functions inside the closure.

@srosenberg
Copy link
Member

NB: we should disallow Eventually regardless because of reason (2) - it leaks goroutines even if you don't use the assert/require functions inside the closure.

Agreed. It's odd that Eventually executes the condition (closure) inside a new goroutine; definitely an unwanted side-effect.

knz added a commit to knz/cockroach that referenced this issue Apr 10, 2023
The test is flaky; sadly, the error does not reproduce locally and the
precise cause of the flakiness is obscured by cockroachdb#100796.

This commit re-enables the test with the fix for cockroachdb#100796, so that the
next CI failure can reveal more details about what went wrong.

Release note: None
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 10, 2023

Disallowing Eventually seems pretty heavy-handed here. Using t.Fatal in a goroutine is problematic in general, not just with Eventually, but using assert is fine afaik. A linter would be helpful to catch these cases. I much prefer Eventually over SucceedsSoon because I have better control over the timeout and frequency -- I often find myself having to manually reduce the SucceedsSoon timeout to something reasonable when iterating on tests locally.

As for the dangling goroutine, this can be handled by propagating a context to the closure that's cancelled when the test ends. Again, this is good practice in general, even outside of Eventually -- for example, many tests will simply use a bare channel read instead of a cancellable select, and if a test race/issue prevents the channel send then the test will hang indefinitely until the timeout triggers which is a failure mode that's hard to debug.

@knz
Copy link
Contributor Author

knz commented Apr 10, 2023

We have testutils.SucceedsWithin which has a customizable timeout.

craig bot pushed a commit that referenced this issue Apr 10, 2023
100925: sql: deflake TestErrorDuringExtendedProtocolCommit r=ecwal a=rafiss

This uses the traceID to make sure that the error is injected for the correct auto-commit. Previously, internal queries could race with the query we want to error out on.

fixes #75549
backport fixes #100480
backport fixes #82010

Release note: None

100993: server: try re-enable TestAdminDecommissionedOperations r=abarganier a=knz

The test is flaky; sadly, the error does not reproduce locally and the precise cause of the flakiness is obscured by #100796.

This commit re-enables the test with the fix for #100796, so that the next CI failure can reveal more details about what went wrong.

Release note: None
Epic: None

Informs #100791 and #100789.

Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Apr 10, 2023
The test is flaky; sadly, the error does not reproduce locally and the
precise cause of the flakiness is obscured by #100796.

This commit re-enables the test with the fix for #100796, so that the
next CI failure can reveal more details about what went wrong.

Release note: None
@tbg tbg changed the title testing: not safe to use require.Eventually (or assert.Eventually) with other require or assert call inside closure testing: lint against require.Eventually (or assert.Eventually) Jul 20, 2023
@tbg tbg added the quality-friday A good issue to work on on Quality Friday label Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. quality-friday A good issue to work on on Quality Friday T-testeng TestEng Team
Projects
None yet
Development

No branches or pull requests

4 participants