Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Pod to Pod connectivity in Antrea Multi-cluster #4219

Merged
merged 1 commit into from
Oct 1, 2022

Conversation

hjiajing
Copy link
Contributor

@hjiajing hjiajing commented Sep 9, 2022

Add a new field PodCIDRs in the ClusterInfo for a member cluster to export its Pod CIDRs to other clusters. antrea-agent adds flows to route traffic to remote cluster Pods through Gateway Nodes.

Signed-off-by: hujiajing [email protected]

@luolanzone luolanzone added the area/multi-cluster Issues or PRs related to multi cluster. label Sep 9, 2022
@luolanzone luolanzone added this to the Antrea v1.9 release milestone Sep 9, 2022
@codecov
Copy link

codecov bot commented Sep 13, 2022

Codecov Report

Merging #4219 (c588bdd) into main (0f77529) will increase coverage by 0.32%.
The diff coverage is 55.17%.

❗ Current head c588bdd differs from pull request most recent head 3d85046. Consider uploading reports for the commit 3d85046 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4219      +/-   ##
==========================================
+ Coverage   62.26%   62.58%   +0.32%     
==========================================
  Files         385      385              
  Lines       54501    55017     +516     
==========================================
+ Hits        33933    34433     +500     
- Misses      18069    18073       +4     
- Partials     2499     2511      +12     
Flag Coverage Δ *Carryforward flag
e2e-tests 39.63% <47.36%> (?)
integration-tests 34.84% <24.63%> (ø) Carriedforward from 0f77529
kind-e2e-tests 48.16% <20.58%> (ø) Carriedforward from 0f77529
unit-tests 43.86% <21.95%> (ø) Carriedforward from 0f77529

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
multicluster/cmd/multicluster-controller/member.go 0.00% <0.00%> (ø)
...ulticluster/cmd/multicluster-controller/options.go 9.30% <0.00%> (ø)
pkg/ovs/openflow/ofctrl_meter.go 44.77% <0.00%> (ø)
pkg/agent/openflow/multicast.go 18.60% <13.33%> (-2.59%) ⬇️
pkg/agent/openflow/service.go 91.57% <40.00%> (+0.27%) ⬆️
pkg/ovs/openflow/ofctrl_bridge.go 78.18% <47.36%> (+2.06%) ⬆️
pkg/agent/openflow/client.go 73.02% <69.23%> (+1.16%) ⬆️
pkg/ovs/openflow/ofctrl_group.go 74.07% <72.72%> (+9.43%) ⬆️
...ter/controllers/multicluster/gateway_controller.go 62.40% <75.00%> (ø)
pkg/agent/multicluster/mc_route_controller.go 50.18% <100.00%> (-2.10%) ⬇️
... and 16 more

@hjiajing hjiajing force-pushed the pod-pod branch 2 times, most recently from 5b336c6 to 0c694e9 Compare September 13, 2022 12:18
@luolanzone
Copy link
Contributor

Hi @hjiajing your DCO info is missing, and the unit test is failed, please fix them.
The patch test coverage is lower than target, could you check if possible to improve it. thanks.

multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
multicluster/controllers/multicluster/node_controller.go Outdated Show resolved Hide resolved
return err
}
for _, node := range nodeList.Items {
podCIDRs = append(podCIDRs, node.Spec.PodCIDRs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check if this can handle empty PodCIDRs properly?

@@ -142,7 +143,17 @@ func (r *NodeReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil

// When a Node is added to a member cluster or removed from a member cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this comment should be updated, this block also handles Node update case.
Can we compare existing CIDRs so we can skip refresh Gateway unnecessarily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I used node list rather than other annotations such as "lastUpdateTime" is that we will compare the node list to skip the unnecessary refresh in the updateActiveGateway method.

If we need to compare the CIDRs, maybe we need to set the store the CIDRs in the Gateway.Spec or Annotations. I'm not sure if it's necessary since we store it in the ClusterInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you guys think we can just ask users to provide the PodCIDRs? I want not to add much complexity for auto-discovering Pod CIDRs? @luolanzone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is also workable. Maybe add a config item clusterCIDRs the same as Antrea agent?

Copy link
Contributor

@jianjuns jianjuns Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe still podCIDRs for easier understanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone @jianjuns Maybe we can add a new field PodCIDRs in struct options inmulticluster/cmd/multicluster-controller/options.go. And the user must edit the antrea-multicluster-member.yml to specify the CIDR.

The CIDR must be the cluster Pod CIDRs or a superset of it, should we add a check about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no need to check. But wonder if we should check no overlapping among clusters. At least we should check in each cluster that remote cluster CIDRs cannot overlap with the local cluster, and reject that import if it does overlap. We can do a separate PR for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjiajing Yes, I think a new option field is required for PodCIDRs, and you should also add a config item here https://github.com/antrea-io/antrea/blob/main/multicluster/apis/multicluster/v1alpha1/multiclusterconfig_types.go#L38.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone Thanks for the reminder. The item PodCIDRs was added to MulticlusterConfig.

@hjiajing
Copy link
Contributor Author

Hi @hjiajing your DCO info is missing, and the unit test is failed, please fix them. The patch test coverage is lower than target, could you check if possible to improve it. thanks.

Sure. I'll fix it. Thanks

@hjiajing hjiajing force-pushed the pod-pod branch 2 times, most recently from bb5f668 to 51f0332 Compare September 19, 2022 01:09
@jianjuns jianjuns changed the title Multicluster Pod to Pod connectivity Support Pod to Pod connectivity in Antrea Multi-cluster Sep 20, 2022
@hjiajing hjiajing force-pushed the pod-pod branch 2 times, most recently from 39f7227 to cedc9ee Compare September 21, 2022 15:28
@hjiajing hjiajing force-pushed the pod-pod branch 2 times, most recently from 5e15dd8 to dfd766e Compare September 22, 2022 16:22
@jianjuns
Copy link
Contributor

LGTM

klog.V(2).InfoS("No difference between new and installed ClusterInfoImports, skip updating", "clusterinfoimport", ciImport.Name)
return nil
}
}

klog.InfoS("Adding/updating remote Gateway Node flows for Multi-cluster", "gateway", klog.KObj(activeGW),
"node", c.nodeConfig.Name, "peer", tunnelPeerIPToRemoteGW)
allCIDRs := []string{ciImport.Spec.ServiceCIDR}
allCIDRs := append([]string{ciImport.Spec.ServiceCIDR}, ciImport.Spec.PodCIDRs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add all Pod CIDRs here like Service CIDR, I think it will also add a SNAT flow to perform SNAT for Pod-to-Pod connectivity, I feel it would be OK to keep the behavior consistent, but seems we didn't discuss this before. @jianjuns it this expected to do SNAT for Pod to Pod access?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can do SNAT for now. @luolanzone Maybe you can also reach to Tao to understand his requirements?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, @hjiajing let's do SNAT for 1.9, I will check with Tao for the details.

luolanzone
luolanzone previously approved these changes Sep 26, 2022
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jianjuns
Copy link
Contributor

/test-all

@jianjuns
Copy link
Contributor

/test-multicluster-e2e

@jianjuns
Copy link
Contributor

@hjiajing @luolanzone : this PR did not reach the UT coverage target. Could you check any way to improve?

@hjiajing
Copy link
Contributor Author

@jianjuns Sure. I'll check and improve the unit test coverage.

@hjiajing
Copy link
Contributor Author

@jianjuns I added some UT for the options.go. The coverage increased from 3% to 75% when I tested it locally. But the codecov report shows that it decreased by 6% and there are many files I didn't edit. I have reported this to Wenqi (He said he noticed it too)

@hjiajing
Copy link
Contributor Author

@jianjuns I added some UT for the options.go. The coverage increased from 3% to 75% when I tested it locally. But the codecov report shows that it decreased by 6% and there are many files I didn't edit. I have reported this to Wenqi (He said he noticed it too)

@jianjuns Discussed with Wenqi. The total coverage decreased because some tests was still running, and some tests wasn't triggered. After more unit tests added, the unit test coverage of this PR is 93% reached the target now (70%)

@wenqiq
Copy link
Contributor

wenqiq commented Sep 27, 2022

@jianjuns I added some UT for the options.go. The coverage increased from 3% to 75% when I tested it locally. But the codecov report shows that it decreased by 6% and there are many files I didn't edit. I have reported this to Wenqi (He said he noticed it too)

**decov/patch ** — 93.75% of diff hit (target 70.00%)

Hi, @hjiajing After checking the Codecov links. I think you should just keep an eye on this patch/status data. And 93.75% is a good coverage rate.
The decreased coverage you see just now is caused by integration or e2e test cases that not ran completely thus the results are not uploaded to the Codecov.

@jianjuns
Copy link
Contributor

Yes, the coverage rate is good now.

@luolanzone : do you have extra comments?

@jianjuns
Copy link
Contributor

/test-all

@luolanzone
Copy link
Contributor

@jianjuns no more comments from me. thanks.

@luolanzone
Copy link
Contributor

luolanzone commented Sep 28, 2022

Seems there are some problems with github checks. re-trigger test-all to see if it helps.
/test-all

luolanzone
luolanzone previously approved these changes Sep 28, 2022
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hjiajing
Copy link
Contributor Author

/test-all

2 similar comments
@hjiajing
Copy link
Contributor Author

/test-all

@hjiajing
Copy link
Contributor Author

/test-all

@jianjuns jianjuns merged commit c7c70e6 into antrea-io:main Oct 1, 2022
heanlan pushed a commit to heanlan/antrea that referenced this pull request Mar 29, 2023
Each member cluster exports cluster PodCIDRs with ClusterInfo.
antrea-agent programs flows to route traffic to remote cluster Pods
through the tunnels between Gateway Nodes.

Signed-off-by: hujiajing <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants