-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: prevent double initialization in cases where an error was mistakenly retried #1404
Conversation
a059264
to
54e6683
Compare
Currently it certainly can change to READY multiple times. This is something we do want to prevent in the future by disallowing further init retries once we see that the connection became ready once. |
d.spinner.Stop() | ||
d.spinner.Start("Initializing cluster ", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should either prevent this spinner change from happening more than once in this func or prevent multiple init calls all together (see my other comment).
@daniel-weisse @3u13r any thoughts on preventing multiple init calls after the atls conn was established at least once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping from an established connection to connecting should result in a failure imo.
We need to refactor the whole init logic anyway. Currently, a failed attestation will result in a warning about this being a non recoverable error (which is not true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make a double connectivity.Ready
a failure for now and discuss a larger refactor to init in an upcoming dev sync?
Because I would also +1 for a refactor just because of #1403 for which proper fixes also might include a refactor to init (from the bootstrapper side) to remove or better mitigate the pesky timeout logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023-03-14T08:39:56+01:00 DEBUG cmd/init.go:246 Created protoClient
2023-03-14T08:39:56+01:00 DEBUG cmd/init.go:259 Connection state started as CONNECTING
2023-03-14T08:39:56+01:00 DEBUG cloudcmd/validators.go:170 Validating attestation document
2023-03-14T08:39:57+01:00 DEBUG cloudcmd/validators.go:170 Successfully validated attestation document
2023-03-14T08:39:57+01:00 DEBUG cmd/init.go:261 Connection state changed to CONNECTING
2023-03-14T08:39:57+01:00 DEBUG cmd/init.go:264 Connection ready
// Init actually starts
Connection changes to CONNECTING
twice, so we should definitely not cancel here. But aborting with unretriable error after "Connection ready" seems ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I gave it a try with channels in the new commit. Check it out, this makes things definitely more complicated though 😄
Note that I couldn't really test this yet since in my current attempts (yanking the cable, temporarily disabling WiFi) the connection survived. Seems it a bit more resilient than I thought it was (or macOS keeps WiFi on for a bit after disabling it).
1e4d7cf
to
0f85bf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall implementation idea looks good to me, though I would like this to be extensively tested before merging.
Is it possible to write a unit test for this?
I think so far we don't have a test for initCall. I can try writing one that at least tests the goroutine logic here. |
Okay, now the unit test actually complain about data races with the use of the logger in the goroutine. Will investigate. |
17c2bae
to
b521e19
Compare
Added a test and removed that ugly Now there's still two issues:
This cancelling on multiple events is way more painful than I thought it would be... It might actually be easier to move this down to |
121b5dc
to
1b2f76a
Compare
Bye channels and bye data races, this should be easier if everything is just in the same function 😅 |
cli/internal/cmd/init.go
Outdated
func (d *initDoer) handleGRPCStateChanges(ctx context.Context, wg *sync.WaitGroup, conn *grpc.ClientConn) { | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this function start a new Go routine instead of letting the caller handle this? @malt3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost in line with how the rest of our code handles go routines. CC @katexochen
The general pattern we currently follow:
func foo(ctx context.Context) {
var wg sync.WaitGroup
defer wg.Wait()
usesAGoRoutine(ctx, wg)
// do something ....
}
func usesAGoRoutine(ctx context.Context, wg sync.WaitGroup) {
wg.Add(1)
go func() {
defer wg.Done()
// do something in the background
}
}
Actually thanks to the linter warning I see that this doesn't work as aspected given that the goroutine is shutdown when the state changes to READY. Will still need to fix this. |
After discussing with @malt3 it seems like I misunderstood the requirement. If I understood correctly, don't want to abort the same retry when the connection changes to READY twice but rather cancel any upcoming retry attempts when the first connection is dropped. In this case I believe the new commit should achieve this in a way easier way. |
bdffd2c
to
afb7b1c
Compare
697bd16
to
1c3cd3c
Compare
1c3cd3c
to
6afa9d3
Compare
Proposed change(s)
This is based on @malt3 gRPC debug logging functions.
I figured we can also just use them for the normal user output :)
Not sure if this is the best place to put it there, though since I am not sure if the state can theoretically change multiple times to READY.
Hope I can get some feedback on that? Not super familiar with gRPC connection states and how this works in our init.
Otherwise it might make sense to make this a channel and let the caller handle this.
Second commit also adds a note that it can take a few minutes.
Checklist