cli: prevent double initialization in cases where an error was mistakenly retried #1404

Nirusu · 2023-03-11T08:52:30Z

Proposed change(s)

Add "Connecting" spinner state for "constellation init"
Show "Initializing cluster" only when our gRPC connection was successful

This is based on @malt3 gRPC debug logging functions.
I figured we can also just use them for the normal user output :)
Not sure if this is the best place to put it there, though since I am not sure if the state can theoretically change multiple times to READY.
Hope I can get some feedback on that? Not super familiar with gRPC connection states and how this works in our init.
Otherwise it might make sense to make this a channel and let the caller handle this.

Second commit also adds a note that it can take a few minutes.

Checklist

Update docs
Add labels (e.g., for changelog category)
Link to Milestone

malt3 · 2023-03-13T08:42:32Z

I am not sure if the state can theoretically change multiple times to READY.

Currently it certainly can change to READY multiple times. This is something we do want to prevent in the future by disallowing further init retries once we see that the connection became ready once.

malt3 · 2023-03-13T09:36:13Z

cli/internal/cmd/init.go

+			d.spinner.Stop()
+			d.spinner.Start("Initializing cluster ", false)


I think we should either prevent this spinner change from happening more than once in this func or prevent multiple init calls all together (see my other comment).

@daniel-weisse @3u13r any thoughts on preventing multiple init calls after the atls conn was established at least once?

Dropping from an established connection to connecting should result in a failure imo.
We need to refactor the whole init logic anyway. Currently, a failed attestation will result in a warning about this being a non recoverable error (which is not true).

Should we make a double connectivity.Ready a failure for now and discuss a larger refactor to init in an upcoming dev sync?

Because I would also +1 for a refactor just because of #1403 for which proper fixes also might include a refactor to init (from the bootstrapper side) to remove or better mitigate the pesky timeout logic.

2023-03-14T08:39:56+01:00 DEBUG cmd/init.go:246 Created protoClient 2023-03-14T08:39:56+01:00 DEBUG cmd/init.go:259 Connection state started as CONNECTING 2023-03-14T08:39:56+01:00 DEBUG cloudcmd/validators.go:170 Validating attestation document 2023-03-14T08:39:57+01:00 DEBUG cloudcmd/validators.go:170 Successfully validated attestation document 2023-03-14T08:39:57+01:00 DEBUG cmd/init.go:261 Connection state changed to CONNECTING 2023-03-14T08:39:57+01:00 DEBUG cmd/init.go:264 Connection ready // Init actually starts

Connection changes to CONNECTING twice, so we should definitely not cancel here. But aborting with unretriable error after "Connection ready" seems ok to me.

Alright, I gave it a try with channels in the new commit. Check it out, this makes things definitely more complicated though 😄

Note that I couldn't really test this yet since in my current attempts (yanking the cable, temporarily disabling WiFi) the connection survived. Seems it a bit more resilient than I thought it was (or macOS keeps WiFi on for a bit after disabling it).

daniel-weisse

Overall implementation idea looks good to me, though I would like this to be extensively tested before merging.
Is it possible to write a unit test for this?

cli/internal/cmd/init.go

Nirusu · 2023-03-15T08:18:06Z

I think so far we don't have a test for initCall. I can try writing one that at least tests the goroutine logic here.

Nirusu · 2023-03-15T08:23:50Z

Okay, now the unit test actually complain about data races with the use of the logger in the goroutine. Will investigate.

cli/internal/cmd/init.go

Nirusu · 2023-03-15T16:14:02Z

Added a test and removed that ugly conntectedTwice variable by just using context.CancelWithCause.

Now there's still two issues:

The data race still exists with the logger (effectively only reachable in the test I guess though). I don't know how to fix this.
The test is still slightly racey without a time.Sleep to let the context cancellation propagate after the sender goroutine fired off the event over the channel and checks the context status. This should be fixable somehow though.
Channels are blocking by default so I guess in a very unfortunate event logGRPCStateChanges could try to send on a blocking channel if the events change rapidly. I don't think that's really likely to happen, though?

This cancelling on multiple events is way more painful than I thought it would be...

It might actually be easier to move this down to logGRPCStateChanges rather than using channels and dealing with this mess of synchronization. What do you think about this?

Nirusu · 2023-03-16T09:50:38Z

Bye channels and bye data races, this should be easier if everything is just in the same function 😅

daniel-weisse · 2023-03-16T09:55:38Z

cli/internal/cmd/init.go

+func (d *initDoer) handleGRPCStateChanges(ctx context.Context, wg *sync.WaitGroup, conn *grpc.ClientConn) {
 	go func() {


Why does this function start a new Go routine instead of letting the caller handle this? @malt3

This is almost in line with how the rest of our code handles go routines. CC @katexochen

The general pattern we currently follow:

func foo(ctx context.Context) { var wg sync.WaitGroup defer wg.Wait() usesAGoRoutine(ctx, wg) // do something .... } func usesAGoRoutine(ctx context.Context, wg sync.WaitGroup) { wg.Add(1) go func() { defer wg.Done() // do something in the background } }

Nirusu · 2023-03-17T12:42:08Z

Actually thanks to the linter warning I see that this doesn't work as aspected given that the goroutine is shutdown when the state changes to READY. Will still need to fix this.

Nirusu · 2023-03-17T13:31:52Z

After discussing with @malt3 it seems like I misunderstood the requirement. If I understood correctly, don't want to abort the same retry when the connection changes to READY twice but rather cancel any upcoming retry attempts when the first connection is dropped.

In this case I believe the new commit should achieve this in a way easier way.

cli/internal/cmd/init.go

Nirusu requested a review from katexochen as a code owner March 11, 2023 08:52

Nirusu added the feature This introduces new functionality label Mar 11, 2023

Nirusu requested a review from malt3 March 11, 2023 08:54

edgelesssys deleted a comment from netlify bot Mar 11, 2023

Nirusu force-pushed the feat/spinner-connecting branch from a059264 to 54e6683 Compare March 11, 2023 08:58

malt3 reviewed Mar 13, 2023

View reviewed changes

Nirusu force-pushed the feat/spinner-connecting branch 2 times, most recently from 1e4d7cf to 0f85bf5 Compare March 14, 2023 17:09

katexochen removed their request for review March 14, 2023 17:21

daniel-weisse reviewed Mar 15, 2023

View reviewed changes

cli/internal/cmd/init.go Outdated Show resolved Hide resolved

malt3 reviewed Mar 15, 2023

View reviewed changes

cli/internal/cmd/init.go Outdated Show resolved Hide resolved

Nirusu force-pushed the feat/spinner-connecting branch from 17c2bae to b521e19 Compare March 15, 2023 13:19

Nirusu force-pushed the feat/spinner-connecting branch from 121b5dc to 1b2f76a Compare March 15, 2023 16:43

daniel-weisse reviewed Mar 16, 2023

View reviewed changes

daniel-weisse approved these changes Mar 16, 2023

View reviewed changes

malt3 reviewed Mar 17, 2023

View reviewed changes

cli/internal/cmd/init.go Outdated Show resolved Hide resolved

malt3 approved these changes Mar 17, 2023

View reviewed changes

Nirusu force-pushed the feat/spinner-connecting branch from bdffd2c to afb7b1c Compare March 17, 2023 13:49

Nirusu requested a review from daniel-weisse March 17, 2023 14:53

daniel-weisse reviewed Mar 20, 2023

View reviewed changes

cli/internal/cmd/init.go Show resolved Hide resolved

Nirusu changed the title ~~cli: split spinner into "Connecting" and "Initializing cluster" state for init~~ cli: split spinner into "Connecting" and "Initializing cluster" state for init (and prevent double initialization cases when an error is mistakenly retried) Mar 20, 2023

Nirusu added the bug fix Fixing a bug label Mar 20, 2023

Nirusu force-pushed the feat/spinner-connecting branch from 697bd16 to 1c3cd3c Compare March 20, 2023 11:40

Nirusu added 2 commits March 20, 2023 12:42

cli: add "Connecting" spinner state for "constellation init"

1c3e679

cli: refuse to retry init once gRPC has reached READY one time

6afa9d3

Nirusu force-pushed the feat/spinner-connecting branch from 1c3cd3c to 6afa9d3 Compare March 20, 2023 11:44

Nirusu merged commit 822d782 into main Mar 20, 2023

Nirusu deleted the feat/spinner-connecting branch March 20, 2023 12:33

derpsteb added needs backport This PR needs to be backported to a previous release and removed feature This introduces new functionality labels Mar 24, 2023

katexochen changed the title ~~cli: split spinner into "Connecting" and "Initializing cluster" state for init (and prevent double initialization cases when an error is mistakenly retried)~~ cli: prevent double initialization in cases where an error was mistakenly retried Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli: prevent double initialization in cases where an error was mistakenly retried #1404

cli: prevent double initialization in cases where an error was mistakenly retried #1404

Nirusu commented Mar 11, 2023 •

edited

Loading

malt3 commented Mar 13, 2023

malt3 Mar 13, 2023

daniel-weisse Mar 13, 2023

Nirusu Mar 13, 2023

daniel-weisse Mar 14, 2023 •

edited

Loading

Nirusu Mar 14, 2023 •

edited

Loading

daniel-weisse left a comment

Nirusu commented Mar 15, 2023

Nirusu commented Mar 15, 2023

Nirusu commented Mar 15, 2023 •

edited

Loading

Nirusu commented Mar 16, 2023

daniel-weisse Mar 16, 2023

malt3 Mar 17, 2023

Nirusu commented Mar 17, 2023 •

edited

Loading

Nirusu commented Mar 17, 2023 •

edited

Loading

		d.spinner.Stop()
		d.spinner.Start("Initializing cluster ", false)

		func (d initDoer) handleGRPCStateChanges(ctx context.Context, wg sync.WaitGroup, conn *grpc.ClientConn) {
		go func() {

cli: prevent double initialization in cases where an error was mistakenly retried #1404

cli: prevent double initialization in cases where an error was mistakenly retried #1404

Conversation

Nirusu commented Mar 11, 2023 • edited Loading

Proposed change(s)

Checklist

malt3 commented Mar 13, 2023

malt3 Mar 13, 2023

Choose a reason for hiding this comment

daniel-weisse Mar 13, 2023

Choose a reason for hiding this comment

Nirusu Mar 13, 2023

Choose a reason for hiding this comment

daniel-weisse Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

Nirusu Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

daniel-weisse left a comment

Choose a reason for hiding this comment

Nirusu commented Mar 15, 2023

Nirusu commented Mar 15, 2023

Nirusu commented Mar 15, 2023 • edited Loading

Nirusu commented Mar 16, 2023

daniel-weisse Mar 16, 2023

Choose a reason for hiding this comment

malt3 Mar 17, 2023

Choose a reason for hiding this comment

Nirusu commented Mar 17, 2023 • edited Loading

Nirusu commented Mar 17, 2023 • edited Loading

Nirusu commented Mar 11, 2023 •

edited

Loading

daniel-weisse Mar 14, 2023 •

edited

Loading

Nirusu Mar 14, 2023 •

edited

Loading

Nirusu commented Mar 15, 2023 •

edited

Loading

Nirusu commented Mar 17, 2023 •

edited

Loading

Nirusu commented Mar 17, 2023 •

edited

Loading