Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds/xdsclient: Fix flaky test TestLRSClient #7559

Merged
merged 1 commit into from
Aug 26, 2024

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Aug 24, 2024

Fixes: #7241

Background

The test is sometimes getting a second load report before the client closes the stream. The test is setting the load report interval to 50 ms here:

Resp: &v3lrspb.LoadStatsResponse{
SendAllClusters: true,
LoadReportingInterval: &durationpb.Duration{Nanos: 50000000},
},

In the final assertion of the test, the test is checking that the very next request present in the fake server's request channel is the client cancellation:

if u, err := fs2.LRSRequestChan.Receive(ctx); err != nil || status.Code(u.(*fakeserver.Request).Err) != codes.Canceled {
t.Errorf("unexpected LRS request: %v, %v, want error canceled", u, err)
}

When the the test takes more than 50 ms b/w sending the first LRS response with the load report interval and this final assertion, another load report can be sent by the client here:

func (t *Transport) sendLoads(ctx context.Context, stream lrsStream, clusterNames []string, interval time.Duration) {
tick := time.NewTicker(interval)
defer tick.Stop()
for {
select {
case <-tick.C:
case <-ctx.Done():
return
}
if err := t.sendLoadStatsRequest(stream, t.lrsStore.Stats(clusterNames)); err != nil {
t.logger.Warningf("Writing to LRS stream failed: %v", err)
return
}
}
}

Repro

To make the test fail with a similar error message (loadreport_test.go:139: unexpected LRS request: &{ <nil>}, <nil>, want error canceled) add a time.Sleep(100 * time.Millisecond) just after receiving the first LRS request on the server here:

u, err := fs2.LRSRequestChan.Receive(ctx)
if err != nil {
t.Fatalf("unexpected LRS request: %v, %v, want error canceled", u, err)
}

Fix

This PR keeps reading from fs2.LRSRequestChan.Receive until we get an error due to context expiration or see the expected client context cancellation response.

RELEASE NOTES: None

@arjan-bal arjan-bal added this to the 1.67 Release milestone Aug 24, 2024
Copy link

codecov bot commented Aug 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.83%. Comparing base (b45fc41) to head (7c0505d).
Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7559      +/-   ##
==========================================
- Coverage   82.07%   81.83%   -0.25%     
==========================================
  Files         360      361       +1     
  Lines       27533    27816     +283     
==========================================
+ Hits        22599    22764     +165     
- Misses       3759     3855      +96     
- Partials     1175     1197      +22     

see 47 files with indirect coverage changes

@arvindbr8 arvindbr8 assigned arjan-bal and unassigned easwars Aug 26, 2024
@arjan-bal arjan-bal merged commit 6d97688 into grpc:master Aug 26, 2024
14 checks passed
@arjan-bal arjan-bal deleted the grpc-go-flaky-test branch August 26, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: 6/100K: Test/LRSClient
3 participants