Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine Node controller to support Gateway HA #4069

Merged
merged 1 commit into from
Sep 2, 2022

Conversation

luolanzone
Copy link
Contributor

@luolanzone luolanzone commented Aug 2, 2022

In order to support Gateway active-standby mode HA, the Node
controller has been refined to handle Node readiness changes.
There will be at most one Gateway in a given Namespace, and the
Gateway controller is updated correspondingly.

  1. Check Node readiness and create a Gateway CR if Node is ready
    and there is no existing Gateway.
  2. Initialize active Gateway and Gateway candidate Nodes with annotation
    'multicluster.antrea.io/gateway'.
  3. Add Gateway webhook to allow at most one Gateway in a given
    Namespace.

For #3754
Signed-off-by: Lan Luo [email protected]

@luolanzone luolanzone added the area/multi-cluster Issues or PRs related to multi cluster. label Aug 2, 2022
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

Merging #4069 (49ceb32) into main (3714619) will increase coverage by 4.66%.
The diff coverage is 72.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4069      +/-   ##
==========================================
+ Coverage   56.09%   60.75%   +4.66%     
==========================================
  Files         371      392      +21     
  Lines       53327    53927     +600     
==========================================
+ Hits        29913    32766    +2853     
+ Misses      21071    18716    -2355     
- Partials     2343     2445     +102     
Flag Coverage Δ
e2e-tests 58.75% <44.02%> (?)
integration-tests 34.81% <0.00%> (-0.17%) ⬇️
kind-e2e-tests 48.66% <ø> (+5.61%) ⬆️
unit-tests 41.39% <67.93%> (+1.11%) ⬆️
Impacted Files Coverage Δ
...ter/controllers/multicluster/gateway_controller.go 71.42% <20.00%> (+70.20%) ⬆️
...luster/controllers/multicluster/node_controller.go 76.09% <72.36%> (+76.09%) ⬆️
...ter/cmd/multicluster-controller/gateway_webhook.go 80.95% <80.95%> (ø)
multicluster/cmd/multicluster-controller/member.go 68.00% <100.00%> (ø)
pkg/ipfix/ipfix_process.go 81.25% <0.00%> (-18.75%) ⬇️
pkg/agent/flowexporter/exporter/exporter.go 67.82% <0.00%> (-10.15%) ⬇️
pkg/controller/networkpolicy/store/addressgroup.go 88.37% <0.00%> (-3.49%) ⬇️
pkg/controller/networkpolicy/mutate.go 63.47% <0.00%> (-3.48%) ⬇️
...ntroller/networkpolicy/networkpolicy_controller.go 82.75% <0.00%> (-0.37%) ⬇️
...luster-controller/memberclusterannounce_webhook.go 73.33% <0.00%> (ø)
... and 104 more

@jianjuns
Copy link
Contributor

jianjuns commented Aug 2, 2022

@luolanzone : could you explain any real issue can happen without the changes? If there were indeed issues, please describe them in the commit description too.

@luolanzone
Copy link
Contributor Author

luolanzone commented Aug 2, 2022

Hi @jianjuns this is actually a part of #3754 for Gateway HA, I feel it's relatively independent, so I create a separate PR for this change first.

@luolanzone luolanzone changed the title Refine Node controller [WIP]Refine Node controller Aug 5, 2022
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@luolanzone luolanzone changed the title [WIP]Refine Node controller Refine Node controller to support Gateway HA Aug 10, 2022
@luolanzone
Copy link
Contributor Author

Hi @jianjuns I updated the latest design for this PR on #3754, could you take a look and review? thanks.

@luolanzone luolanzone force-pushed the refine-node-controller branch 2 times, most recently from df38938 to 3f94ecb Compare August 23, 2022 09:22
if err := r.Client.List(ctx, nodeList, &client.ListOptions{}); err != nil {
return err
}
// Empty the cache in case of initializeActiveGateway failed and rerun.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failure -> failure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see you added code to do the cleanup? Maybe simpler to move the code to the end of func before returning without no error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, add the candidates initialization before return nil.

multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
}
klog.ErrorS(err, "Failed to get Node", "node", req.Name)
return ctrl.Result{}, err
return ctrl.Result{}, client.IgnoreNotFound(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should skip the previous log if not found, so I would return in line 103.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, fixed

multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
isActiveGateway := r.activeGateway == node.Name
toRecreate := false
if hasGWAnnotation {
r.gatewayCandidates[req.Name] = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should delete the node from gatewayCandidates if no annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
@luolanzone luolanzone force-pushed the refine-node-controller branch 2 times, most recently from 8ee2fee to bca6819 Compare August 26, 2022 07:23
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
jianjuns
jianjuns previously approved these changes Aug 30, 2022
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Several more nits for you to consider.

}
existingGW.GatewayIP = newGateway.GatewayIP
existingGW.InternalIP = newGateway.InternalIP
if err := r.Client.Update(ctx, existingGW, &client.UpdateOptions{}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment about if the Gateway version in the client cache is stale, the update operation will fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
@jianjuns
Copy link
Contributor

@luolanzone : please check if you can improve coverage to meet the target.

@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

In order to support Gateway active-standby mode HA, the Node
controller has been refined to handle Node readiness change.
There will be at most one Gateway in a given Namespace, and the
Gateway controller is updated correspondingly.

1. Check Node readiness and create a Gateway CR if Node is ready
and there is no existing Gateway.
2. Initilize active Gateway and Gateway candidate Nodes with annotation
'multicluster.antrea.io/gateway'.
3. Add Gateway webhook to allow at most one Gateway in a given
   Namespace.

Signed-off-by: Lan Luo <[email protected]>
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@luolanzone
Copy link
Contributor Author

@jianjuns Could you help to check again? I added a few more tests, looks like the result of codecov/patch is green but I didn't see any test coverage data change. I have checked the test data result locally, these uncovered lines are about handling error like get/list failure which is not provided by fake k8s client.

@jianjuns
Copy link
Contributor

jianjuns commented Sep 2, 2022

/skip-all

@jianjuns jianjuns merged commit 4a378d5 into antrea-io:main Sep 2, 2022
@luolanzone luolanzone deleted the refine-node-controller branch September 28, 2022 14:15
heanlan pushed a commit to heanlan/antrea that referenced this pull request Mar 29, 2023
…#4069)

In order to support Gateway active-standby mode HA, the Node
controller has been revised to handle Node readiness changes.
There will be at most one Gateway in a single Namespace.

1. Check Node readiness and create a Gateway CR if Node is ready
and there is no existing Gateway.
2. Initialize active Gateway and Gateway candidate Nodes with annotation
'multicluster.antrea.io/gateway'.
3. Add Gateway webhook to allow at most one Gateway in a single
   Namespace.

Signed-off-by: Lan Luo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants