-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ClusterInfo type ResourceExport recreation bug #4412
Fix ClusterInfo type ResourceExport recreation bug #4412
Conversation
/test-multicluster-e2e |
Codecov Report
@@ Coverage Diff @@
## main #4412 +/- ##
==========================================
- Coverage 65.56% 65.54% -0.03%
==========================================
Files 400 412 +12
Lines 56847 56941 +94
==========================================
+ Hits 37270 37320 +50
- Misses 16882 16941 +59
+ Partials 2695 2680 -15
|
multicluster/controllers/multicluster/gateway_controller_test.go
Outdated
Show resolved
Hide resolved
@@ -113,6 +113,11 @@ func (r *GatewayReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ct | |||
} | |||
return nil | |||
} | |||
|
|||
if !existingResExport.DeletionTimestamp.IsZero() { | |||
return fmt.Errorf("existing ResourceExport is being deleted, retry later") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - if we do not add this check, will updateResourceExport() fail? Then what is the difference of returning an error here, compared to letting updateResourceExport() fail and return an error at line 122?
Have you checked how long the new active GW can be finally recreated, when the race condition happens and we have to retry here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateResourceExport() will not fail because update is allowed when DeletionTimestamp is not empty. The updated one will be recycled by K8s because DeletionTimestamp is not zero after that.
The new active Gateway is created immediately when old Gateway is removed or down. The corresponding ClusterInfo type of ResourceExport will also be created almost immediately because it's usually one time retry unless the leader controller is down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, could there be another race condition that the check is passed here, but DeletionTimeStamp is set before or after update is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we only have one goroutine in gateway controller to handle Gateway change considering Gateway event rarely happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you get the ResourceExport from client cache right? The state can be stale?
When the Gateway changes, should we create the ResourceExport rather than update it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling client cache won't resolve it as the object in apiserver may be updated after the API is called.
This situation is very common and a typical way to resolve it is, using the object in the cache without concerning if it's stale or not, updating the object with resourceVersion got from the cache (clientset automatically does it as long as the object has resourceVersion), then if the object is stale, apiserver will reject the update request with updateConflict error, and client should retrieve the latest version from apiserver directly and retry. Example: https://github.com/antrea-io/antrea/blob/main/pkg/agent/controller/egress/egress_controller.go#L561
But I wonder even if it updates the resource export successfully, could the other slow leader delete it because it thinks this is the one for itself? If yes, the deletion logic should check whether the object matches and set resourceVersion when sending the delete request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tnqn thanks for the suggestion. I am not sure I got your last question about other slow leader
, there is only one leader controller in one ClusterSet, and each member cluster will create its own ClusterInfo type of ResourceExport and also 1-1 mapping ResourceImport. Only when member cluster submits the deletion request, the leader controller will delete corresponding ClusterInfo of ResourceExport. But I think we can set resourceVersion for delete request considering it should have no side-effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianjuns I have refined the code to create ResourceExport when DeletionTimestamp != 0. After checking existing codes, it already has resourceVersion when it's trying to update existing ResourceExport, so it should fail when there is conflict and retry. To avoid stale cache, I removed the spec comparison steps and let it update anyway. For the deletion action, I feel it's Ok without resourceVersion considering Gateway webhook will guarantee that only one Gateway is allowed in a member cluster. all events should happen by sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still keep the resourceVersion in update or not? If yes, maybe add a comment before the update call about that.
@tnqn : the controller has only a single worker, so no synchronization issue or reordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianjuns yes, the updateResourceExport
will reuse existing ResourceExport's resourceVersion to update. I have added a comment for this.
950ad3e
to
9ffb265
Compare
9ffb265
to
519b8a5
Compare
After Gateway HA is enabled, the ClusterInfo type of ResourceExport will be recreated when the active Gateway is changed. But there is a case that a new ClusterInfo of ResourceExport creation may fail when the leader controller process is slow and existing ResourceExport is not deleted in time. Fix the issue through retry when the existing ResourceExport's DeletionTimestamp is not zero. Signed-off-by: Lan Luo <[email protected]>
519b8a5
to
407ecda
Compare
/test-multicluster-e2e |
/skip-all |
After Gateway HA is enabled, the ClusterInfo type of ResourceExport will be recreated when the active Gateway is changed. But there is a case that a new ClusterInfo of ResourceExport creation may fail when the leader controller process is slow and existing ResourceExport is not deleted in time.
The root cause is the Gateway creation event will update a ClusterInfo of ResourceExport instead of creating a new one due to it still can get existing one from cache when the ResourceExport's DeletionTimestamp is not zero, after ResourceExport is updated, it will be recycled by kubernetes very quickly since it's stale object. From Gateway controller's perspective, Gateway change is reflecting in ResourceExport, but it's actually being deleted eventually.
Fix the issue through retry when the existing ResourceExport's DeletionTimestamp is not zero.
Signed-off-by: Lan Luo [email protected]