-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify await logic for deletes #3133
Conversation
This change is part of the following stack: Change managed by git-spice. |
Does the PR have any schema changes?Looking good! No breaking changes found. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3133 +/- ##
==========================================
+ Coverage 36.59% 37.93% +1.33%
==========================================
Files 70 76 +6
Lines 9264 9335 +71
==========================================
+ Hits 3390 3541 +151
+ Misses 5541 5452 -89
- Partials 333 342 +9 ☔ View full report in Codecov by Sentry. |
7affd4c
to
5c06e56
Compare
5c06e56
to
49579dd
Compare
r, _ := status.Compute(uns) | ||
if r.Message != "" { | ||
dc.logger.LogMessage(checkerlog.StatusMessage(r.Message)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will still emit messages related to the object's status, if possible.
49579dd
to
1d398f2
Compare
return | ||
} | ||
// Make sure Observers are all done. | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event of context deadline being exceeded, wouldn't this mean that the awaiter still continues waiting/observing? Shouldn't we just skip waiting for the wait group to be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context is plumbed all the way down and respected by the observers, so if it dies all of our informers will shut down and this will resolve. Namely these guys:
Full disclosure I'm very skeptical we need any of this "Hail Mary" logic. There's a comment in the code but I think it largely stems from issues we had handling watch errors. I kept it as-is since we've got tests around it, but I expect we could get rid of it without issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've simplified this a bit -- we don't need to worry about context cancellation here since the observer's already shut down when that happens.
5322331
to
b1d2cfb
Compare
b1d2cfb
to
6383aa6
Compare
return dc, nil | ||
} | ||
|
||
// Range confirms the object exists before establishing an Informer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this not a race between checking the existence and starting the informer? Or does the informer throw an error if the object doesn't exist (in which case, why call Get at all?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment wasn't accurate -- there is a race, so we check if the object was deleted after establishing the informer. (Informers are fine if the object doesn't exist, so you can subscribe to creations.)
dc.logger.LogMessage(checkerlog.WarningMessage( | ||
fmt.Sprintf("finalizers might be preventing deletion (%s)", strings.Join(finalizers, ", ")), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an ephemeral status message? The whole await procedure is ending at this point right? Wonder how anyone would see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I've been looking at a lot of non-interactive output and didn't realize these aren't already persisted. I suggest we do what we do in go-provider/docker-build and treat any warnings/errors as non-status so they get shown in the final interactive output.
I notice that the cli-utils library has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that the cli-utils library has a watcher package, seems similar the core logic in this PR. Thoughts on it?
https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/watcher/doc.go
@EronWright yes, I saw that as well (more specifically the polling portion). We're already using Informers for our other awaiters and they work well enough, so the primary goal is to use them for deletes as well.
return dc, nil | ||
} | ||
|
||
// Range confirms the object exists before establishing an Informer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment wasn't accurate -- there is a race, so we check if the object was deleted after establishing the informer. (Informers are fine if the object doesn't exist, so you can subscribe to creations.)
dc.logger.LogMessage(checkerlog.WarningMessage( | ||
fmt.Sprintf("finalizers might be preventing deletion (%s)", strings.Join(finalizers, ", ")), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I've been looking at a lot of non-interactive output and didn't realize these aren't already persisted. I suggest we do what we do in go-provider/docker-build and treat any warnings/errors as non-status so they get shown in the final interactive output.
379cd8f
to
93206a4
Compare
93206a4
to
73e92c9
Compare
73e92c9
to
27dbecd
Compare
// Our context might be closed, but we still want to issue this request | ||
// even if we're shutting down. | ||
ctx := context.WithoutCancel(dc.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Never seen this before, seems a bit awkward. If the context is indeed canceled, does the result still matter? Maybe you could check err == context.Canceled
?
dc.observer.Range(yield) | ||
}() | ||
|
||
dc.getClusterState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a typo, but exists to cause a side-effect, and would make more sense if the function was called refreshClusterState
.
dc.logger.LogStatus(diag.Warning, | ||
"unexpected error while checking cluster state: "+err.Error(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't an abnormal error cause the waiter to quit?
// Attempt one last lookup if the object still exists. (This is legacy | ||
// behavior that might be unnecessary since we're using Informers instead of | ||
// Watches now.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary to me, like it is second-guessing the informer. But I suppose you need to fetch the object any way to see whether any finalizers exist.
@@ -320,20 +320,26 @@ func TestAwaitDaemonSetDelete(t *testing.T) { | |||
} | |||
|
|||
for _, tt := range tests { | |||
tt := tt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I learned that loop variables are now copied as of go 1.22 and you don't need this line anymore.
27dbecd
to
229a385
Compare
NB: This is a larger change that's easier to review as separate commits. The
first commit introduces some new types and interfaces; the second one hooks up
those types and removes a lot of existing code.
We already have "generic" await logic for deletion -- if we don't have an
explicit delete-awaiter defined for a particular GVK, then we always run some
logic that waits for the resource to 404.
As it turns out, all of our custom delete-awaiters do essentially the same 404
check, modulo differences in the messages we log. This makes the deletion code
flow a good starting place to introduce more generic/unified await logic.
If you look at the code deleted in the second commit you get a good sense of
the current issues with our awaiters: each custom awaiter is responsible for
establishing its own watchers; determining its default timeout; performing the
same 404 check; etc. There's a lot of duplication and subtle differences in
behavior that leads to issues like
#1232.
As part of this change we start decomposing our await logic into more
composable pieces. Note that I'm replacing our deletion code path with these
new pieces because I don't think we lose much by changing the logged messages,
but for the create/update code paths we'll need some glue to preserve existing
await behavior.
The relevant interfaces are:
At a high level:
Satisfier
) to wait for during deletion. There isalways a condition even if it's a no-op. Deletion is simple because there
are only two possibilities -- "skip" and "wait for 404" -- but with
create/update and user-defined conditions it will get more interesting.
Satisfier
and can combine it with arbitraryObservers
. This lets us do things like log additional information whilewe're waiting, e.g. Emit event logs during await #3135.
The underlying machinery responsible for handling timeouts, informers, etc. is
all hidden behind the
Source
. Implementing new await logic is essentiallyjust a matter of defining a new
Satisifer
which understands how to evaluatean unstructured resource.
A number of unit tests are included as well as an E2E regression test to ensure we respect the
skipAwait
annotation. The existing delete-await tests are mostly unchanged except for tweaks to inject aCondition
instead of anawaitSpec
. Some watcher-specific tests were no longer relevant and were removed, however the functionality is still implemented/tested as part ofAwaiter
.Fixes #3157.
Fixes #1418.
Refs #2824.