Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm: fix flaky TestBasicDialSync #1486

Closed
wants to merge 2 commits into from

Conversation

marten-seemann
Copy link
Contributor

Fixes #1430.

@marten-seemann marten-seemann requested a review from vyzo May 15, 2022 18:19
@marten-seemann
Copy link
Contributor Author

Looks like this introduced some new flakiness, but I can't figure out why:

 === RUN   TestBasicDialSync
      dial_sync_test.go:74: 
          	Error Trace:	dial_sync_test.go:74
          	Error:      	Condition never satisfied
          	Test:       	TestBasicDialSync
          	Messages:   	dial functions should have returned
  --- FAIL: TestBasicDialSync (0.16s)

@MarcoPolo MarcoPolo self-requested a review May 16, 2022 19:15
@MarcoPolo
Copy link
Collaborator

Looks like this introduced some new flakiness, but I can't figure out why:

 === RUN   TestBasicDialSync
      dial_sync_test.go:74: 
          	Error Trace:	dial_sync_test.go:74
          	Error:      	Condition never satisfied
          	Test:       	TestBasicDialSync
          	Messages:   	dial functions should have returned
  --- FAIL: TestBasicDialSync (0.16s)

My guess is the small sleep between retries (1ms) is enough to stall progress on other goroutines. This example: https://gist.github.com/MarcoPolo/1bd2f7dcb103f2582043d00e342c1cb9 runs in 800ms on my machine when I use a 1ms sleep, but times out (>2s) when I use the smallest sleep. So maybe 1ms is too quick on the CI just like 1ns is too quick on my machine?

// make the dials return
close(done)
// make sure the Dial functions return
require.Eventually(t, func() bool { return len(finished) == 2 }, 100*time.Millisecond, time.Millisecond, "dial functions should have returned")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like the receive from the channel avoids the notion of time which is nice.

@marten-seemann marten-seemann force-pushed the fix-flaky-basic-dial-sync branch 4 times, most recently from 95d74b5 to 1b61f0e Compare May 18, 2022 09:21
@marten-seemann marten-seemann marked this pull request as draft May 18, 2022 12:43
@marten-seemann marten-seemann force-pushed the fix-flaky-basic-dial-sync branch 2 times, most recently from 9c49349 to 017a59a Compare May 18, 2022 15:39
Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm modulo a nit.

p := peer.ID("testpeer")
var counter int32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a pointer

@marten-seemann
Copy link
Contributor Author

Haven't seen this test flaking in a while. I assume that @sukunrt's smart dialing changes have resolved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

swarm: flaky TestBasicDialSync
3 participants