Unify await logic for deletes #3133

blampe · 2024-07-26T18:48:23Z

NB: This is a larger change that's easier to review as separate commits. The
first commit introduces some new types and interfaces; the second one hooks up
those types and removes a lot of existing code.

We already have "generic" await logic for deletion -- if we don't have an
explicit delete-awaiter defined for a particular GVK, then we always run some
logic that waits for the resource to 404.

As it turns out, all of our custom delete-awaiters do essentially the same 404
check, modulo differences in the messages we log. This makes the deletion code
flow a good starting place to introduce more generic/unified await logic.

If you look at the code deleted in the second commit you get a good sense of
the current issues with our awaiters: each custom awaiter is responsible for
establishing its own watchers; determining its default timeout; performing the
same 404 check; etc. There's a lot of duplication and subtle differences in
behavior that leads to issues like
#1232.

As part of this change we start decomposing our await logic into more
composable pieces. Note that I'm replacing our deletion code path with these
new pieces because I don't think we lose much by changing the logged messages,
but for the create/update code paths we'll need some glue to preserve existing
await behavior.

The relevant interfaces are:

// Observer acts on a watch.Event Source. Range is responsible for filtering
// events to only those relevant to the Observer, and Observe optionally
// updates the Observer's state.
type Observer interface {
	// Range iterates over all events visible to the Observer. The caller is
	// responsible for invoking Observe as part of the provided callback. Range
	// can be used to customize setup and teardown behavior if the Observer
	// wraps another Observer.
	Range(func(watch.Event) bool)

	// Observe handles events and can optionally update the Observer's state.
	// This should be invoked by the caller and not during Range.
	Observe(watch.Event) error
}

// Satisfier is an Observer which evaluates the observed object against some
// criteria.
type Satisfier interface {
	Observer

	// Satisfied returns true when the criteria is met.
	Satisfied() (bool, error)

	// Object returns the last-known state of the object being observed.
	Object() *unstructured.Unstructured
}

// Source encapsulates logic responsible for establishing
// watch.Event channels.
type Source interface {
	Start(context.Context, schema.GroupVersionKind) (<-chan watch.Event, error)
}

At a high level:

We determine what condition (Satisfier) to wait for during deletion. There is
always a condition even if it's a no-op. Deletion is simple because there
are only two possibilities -- "skip" and "wait for 404" -- but with
create/update and user-defined conditions it will get more interesting.
We wait for the condition Satisfier and can combine it with arbitrary
Observers. This lets us do things like log additional information while
we're waiting, e.g. Emit event logs during await #3135.

The underlying machinery responsible for handling timeouts, informers, etc. is
all hidden behind the Source. Implementing new await logic is essentially
just a matter of defining a new Satisifer which understands how to evaluate
an unstructured resource.

A number of unit tests are included as well as an E2E regression test to ensure we respect the skipAwait annotation. The existing delete-await tests are mostly unchanged except for tweaks to inject a Condition instead of an awaitSpec. Some watcher-specific tests were no longer relevant and were removed, however the functionality is still implemented/tested as part of Awaiter.

Fixes #3157.
Fixes #1418.
Refs #2824.

blampe · 2024-07-26T18:48:25Z

This change is part of the following stack:

Unify await logic for deletes #3133 ◀
- Add generic await logic #3143
  - User-defined await logic #3134
  - Emit event logs during await #3135

_{Change managed by git-spice.}

github-actions · 2024-07-26T18:53:26Z

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

codecov · 2024-07-26T18:59:54Z

Codecov Report

Attention: Patch coverage is 87.58621% with 36 lines in your changes missing coverage. Please review.

Project coverage is 37.93%. Comparing base (2ec7a1a) to head (229a385).
Report is 1 commits behind head on master.

Files	Patch %	Lines
provider/pkg/await/internal/awaiter.go	78.18%	9 Missing and 3 partials ⚠️
provider/pkg/await/condition/immediate.go	83.72%	7 Missing ⚠️
provider/pkg/await/condition/source.go	85.00%	3 Missing and 3 partials ⚠️
provider/pkg/await/condition/observer.go	91.66%	3 Missing and 2 partials ⚠️
provider/pkg/await/await.go	82.60%	2 Missing and 2 partials ⚠️
provider/pkg/metadata/overrides.go	83.33%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3133      +/-   ##
==========================================
+ Coverage   36.59%   37.93%   +1.33%     
==========================================
  Files          70       76       +6     
  Lines        9264     9335      +71     
==========================================
+ Hits         3390     3541     +151     
+ Misses       5541     5452      -89     
- Partials      333      342       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

provider/pkg/await/await_test.go

blampe · 2024-07-26T20:58:23Z

provider/pkg/await/condition/deleted.go

+	r, _ := status.Compute(uns)
+	if r.Message != "" {
+		dc.logger.LogMessage(checkerlog.StatusMessage(r.Message))
+	}


We will still emit messages related to the object's status, if possible.

provider/pkg/await/await.go

.github/workflows/run-acceptance-tests.yml

provider/pkg/await/await_test.go

tests/sdk/java/await_test.go

provider/pkg/await/internal/awaiter.go

provider/pkg/await/internal/awaiter_test.go

rquitales · 2024-07-31T18:35:42Z

provider/pkg/await/internal/awaiter.go

+			return
+		}
+		// Make sure Observers are all done.
+		wg.Wait()


In the event of context deadline being exceeded, wouldn't this mean that the awaiter still continues waiting/observing? Shouldn't we just skip waiting for the wait group to be done?

The context is plumbed all the way down and respected by the observers, so if it dies all of our informers will shut down and this will resolve. Namely these guys:

https://github.com/pulumi/pulumi-kubernetes/pull/3133/files#diff-c68cee828d9c5172eef833ba32b6185741c858ff507bd2f4d6df8c5a6fb275dbR55-R59

https://github.com/pulumi/pulumi-kubernetes/pull/3133/files#diff-b52d2594c41dd1ff41784e1c6101fcd3c86f51c128ea8b8de43ddc6607977a15R148

Full disclosure I'm very skeptical we need any of this "Hail Mary" logic. There's a comment in the code but I think it largely stems from issues we had handling watch errors. I kept it as-is since we've got tests around it, but I expect we could get rid of it without issue.

I've simplified this a bit -- we don't need to worry about context cancellation here since the observer's already shut down when that happens.

provider/pkg/await/await_test.go

provider/pkg/await/condition/deleted.go

EronWright · 2024-08-06T21:02:29Z

provider/pkg/await/condition/deleted.go

+	return dc, nil
+}
+
+// Range confirms the object exists before establishing an Informer.


How is this not a race between checking the existence and starting the informer? Or does the informer throw an error if the object doesn't exist (in which case, why call Get at all?).

The comment wasn't accurate -- there is a race, so we check if the object was deleted after establishing the informer. (Informers are fine if the object doesn't exist, so you can subscribe to creations.)

EronWright · 2024-08-06T21:20:59Z

provider/pkg/await/condition/deleted.go

+	dc.logger.LogMessage(checkerlog.WarningMessage(
+		fmt.Sprintf("finalizers might be preventing deletion (%s)", strings.Join(finalizers, ", ")),
+	))


This is an ephemeral status message? The whole await procedure is ending at this point right? Wonder how anyone would see this.

Ah, I've been looking at a lot of non-interactive output and didn't realize these aren't already persisted. I suggest we do what we do in go-provider/docker-build and treat any warnings/errors as non-status so they get shown in the final interactive output.

provider/pkg/await/condition/observer.go

provider/pkg/await/condition/source.go

provider/pkg/await/internal/awaiter.go

EronWright · 2024-08-06T23:36:28Z

I notice that the cli-utils library has a watcher package, seems similar the core logic in this PR. Thoughts on it?
https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/watcher/doc.go

provider/pkg/await/condition/observer.go

blampe

I notice that the cli-utils library has a watcher package, seems similar the core logic in this PR. Thoughts on it?
https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/watcher/doc.go

@EronWright yes, I saw that as well (more specifically the polling portion). We're already using Informers for our other awaiters and they work well enough, so the primary goal is to use them for deletes as well.

provider/pkg/await/await_test.go

blampe · 2024-08-06T22:43:16Z

provider/pkg/await/condition/deleted.go

+	return dc, nil
+}
+
+// Range confirms the object exists before establishing an Informer.


The comment wasn't accurate -- there is a race, so we check if the object was deleted after establishing the informer. (Informers are fine if the object doesn't exist, so you can subscribe to creations.)

blampe · 2024-08-06T23:16:38Z

provider/pkg/await/condition/deleted.go

+	dc.logger.LogMessage(checkerlog.WarningMessage(
+		fmt.Sprintf("finalizers might be preventing deletion (%s)", strings.Join(finalizers, ", ")),
+	))


Ah, I've been looking at a lot of non-interactive output and didn't realize these aren't already persisted. I suggest we do what we do in go-provider/docker-build and treat any warnings/errors as non-status so they get shown in the final interactive output.

provider/pkg/await/condition/deleted.go

provider/pkg/await/condition/source.go

provider/pkg/await/condition/observer.go

provider/pkg/await/internal/awaiter.go

EronWright · 2024-08-20T21:32:32Z

provider/pkg/await/condition/deleted.go

+	// Our context might be closed, but we still want to issue this request
+	// even if we're shutting down.
+	ctx := context.WithoutCancel(dc.ctx)


Nit: Never seen this before, seems a bit awkward. If the context is indeed canceled, does the result still matter? Maybe you could check err == context.Canceled?

EronWright · 2024-08-20T21:34:17Z

provider/pkg/await/condition/deleted.go

+		dc.observer.Range(yield)
+	}()
+
+	dc.getClusterState()


This looks like a typo, but exists to cause a side-effect, and would make more sense if the function was called refreshClusterState.

EronWright · 2024-08-20T21:36:52Z

provider/pkg/await/condition/deleted.go

+		dc.logger.LogStatus(diag.Warning,
+			"unexpected error while checking cluster state: "+err.Error(),


Shouldn't an abnormal error cause the waiter to quit?

EronWright · 2024-08-20T21:49:22Z

provider/pkg/await/condition/deleted.go

+	// Attempt one last lookup if the object still exists. (This is legacy
+	// behavior that might be unnecessary since we're using Informers instead of
+	// Watches now.)


Seems unnecessary to me, like it is second-guessing the informer. But I suppose you need to fetch the object any way to see whether any finalizers exist.

EronWright · 2024-08-20T22:39:52Z

provider/pkg/await/daemonset_test.go

@@ -320,20 +320,26 @@ func TestAwaitDaemonSetDelete(t *testing.T) {
 	}

 	for _, tt := range tests {
+		tt := tt


BTW I learned that loop variables are now copied as of go 1.22 and you don't need this line anymore.

This was referenced Jul 26, 2024

User-defined await logic #3134

Merged

Emit event logs during await #3135

Merged

blampe force-pushed the blampe/2996-await-delete branch 2 times, most recently from 7affd4c to 5c06e56 Compare July 26, 2024 20:28

blampe requested a review from rquitales July 26, 2024 20:28

blampe marked this pull request as ready for review July 26, 2024 20:28

blampe force-pushed the blampe/2996-await-delete branch from 5c06e56 to 49579dd Compare July 26, 2024 20:37

blampe commented Jul 26, 2024

View reviewed changes

provider/pkg/await/await_test.go Outdated Show resolved Hide resolved

provider/pkg/await/await_test.go Show resolved Hide resolved

blampe commented Jul 26, 2024

View reviewed changes

mjeffryes assigned blampe Jul 26, 2024

blampe mentioned this pull request Jul 29, 2024

Consolidate updateAwaitConfig/createAwaitConfig #3139

Merged

blampe force-pushed the blampe/2996-await-delete branch from 49579dd to 1d398f2 Compare July 29, 2024 21:58

rquitales reviewed Jul 31, 2024

View reviewed changes

blampe force-pushed the blampe/2996-await-delete branch from 5322331 to b1d2cfb Compare July 31, 2024 18:58

This was referenced Jul 31, 2024

Add generic await logic #3143

Merged

Allow skipAwait to skip only readiness or deletion #3147

Closed

blampe force-pushed the blampe/2996-await-delete branch from b1d2cfb to 6383aa6 Compare August 1, 2024 19:53

EronWright reviewed Aug 6, 2024

View reviewed changes

EronWright reviewed Aug 7, 2024

View reviewed changes

provider/pkg/await/condition/observer.go Show resolved Hide resolved

blampe commented Aug 7, 2024

View reviewed changes

blampe changed the base branch from master to blampe/await-config August 9, 2024 22:13

blampe force-pushed the blampe/2996-await-delete branch from 379cd8f to 93206a4 Compare August 9, 2024 22:13

Base automatically changed from blampe/await-config to master August 9, 2024 22:50

blampe force-pushed the blampe/2996-await-delete branch from 93206a4 to 73e92c9 Compare August 9, 2024 22:52

mjeffryes added this to the 0.108 milestone Aug 16, 2024

blampe force-pushed the blampe/2996-await-delete branch from 73e92c9 to 27dbecd Compare August 19, 2024 21:51

EronWright approved these changes Aug 20, 2024

View reviewed changes

mikhailshilkov modified the milestones: 0.108, 0.109 Aug 21, 2024

blampe added 15 commits August 21, 2024 13:40

Unify await logic for deletes

e0c1c87

Remove custom delete-await logic

4d3223a

Fix partial error

da7eff7

recover missing step2

6cb97fc

read annotation from inputs

c17b59e

simplify awaiter

4e67b65

fix race condition between 404 check and informers

86a0d39

add observer tests

cf5bd37

feedback

0ca7b8c

drop a defer

524be9b

additional logging

8d04a8e

update log types

d190a0e

preserve existing skipawait behavior

85b8f50

move metadata helper

9f3fd61

fixup

229a385

blampe force-pushed the blampe/2996-await-delete branch from 27dbecd to 229a385 Compare August 21, 2024 20:48

blampe merged commit adfff97 into master Aug 21, 2024
19 checks passed

blampe deleted the blampe/2996-await-delete branch August 21, 2024 21:29

blampe mentioned this pull request Aug 22, 2024

Resource deletion leaks a watcher #2812

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify await logic for deletes #3133

Unify await logic for deletes #3133

blampe commented Jul 26, 2024 •

edited

Loading

blampe commented Jul 26, 2024 •

edited

Loading

github-actions bot commented Jul 26, 2024

codecov bot commented Jul 26, 2024 •

edited

Loading

blampe Jul 26, 2024

rquitales Jul 31, 2024

blampe Aug 1, 2024

blampe Aug 5, 2024

EronWright Aug 6, 2024

blampe Aug 6, 2024

EronWright Aug 6, 2024

blampe Aug 6, 2024

EronWright commented Aug 6, 2024 •

edited

Loading

blampe left a comment

blampe Aug 6, 2024

blampe Aug 6, 2024

EronWright Aug 20, 2024

EronWright Aug 20, 2024

EronWright Aug 20, 2024

EronWright Aug 20, 2024

EronWright Aug 20, 2024

		dc.logger.LogStatus(diag.Warning,
		"unexpected error while checking cluster state: "+err.Error(),

Unify await logic for deletes #3133

Unify await logic for deletes #3133

Conversation

blampe commented Jul 26, 2024 • edited Loading

blampe commented Jul 26, 2024 • edited Loading

github-actions bot commented Jul 26, 2024

Does the PR have any schema changes?

codecov bot commented Jul 26, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EronWright commented Aug 6, 2024 • edited Loading

blampe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blampe commented Jul 26, 2024 •

edited

Loading

blampe commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 26, 2024 •

edited

Loading

EronWright commented Aug 6, 2024 •

edited

Loading