Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add single stack IPv6 support in Flow aggregator and ELK flow collector #1819

Merged
merged 1 commit into from
Mar 19, 2021

Conversation

dreamtalen
Copy link
Contributor

@dreamtalen dreamtalen commented Feb 4, 2021

In this commit, we add single stack IPv6 support in flow aggregator and elk flow collector.

Some screenshots of updated kibana dashboard in IPv6 cluster:
Snipaste_2021-03-12_15-46-37
Snipaste_2021-03-12_15-46-19

For dual stack, there are some issues in service name retreival at flow exporter, will add support in a new PR.

@srikartati
Copy link
Member

/test-all

@srikartati
Copy link
Member

/test-ipv6-only-e2e
/test-ipv6-e2e

return fmt.Errorf("resolved Flow Aggregator address %v is not supported", hostIPs[0])
}
// Update the collector address with resolved IP of flow aggregator
collectorAddr = net.JoinHostPort(hostIPs[0].String(), defaultIPFIXPort)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of dual-stack, it seems we can pick either IPv4 or IPv6 addresses depending on the DNS LookUp output. What if we pick the IPv6 address and the node network is IPv4? In that scenario, the flow exporter will not be able to connect to the flow aggregator. I believe IPv4-in-IPv6 and IPv6-in-IPv4 is possible in dual stack. @lzhecheng any input here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should check the node network address format and pick the correct resolved address of the flow aggregator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, make sense to me, will try to check the node network address format to determine the v4/v6 address of flow aggregator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are possible in a dual-stack setup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking both addresses and picking the "correct" one should be ok. However, I am not sure the case you describe is possible. I think in order to have dual-stack Services, the Nodes need to have dual-stack networking themselves.

BTW, it's still a bit weird to me that:

  • we have a special case for strings.Contains(collectorAddr, flowAggregatorDNSName). I thought we had removed that special case.
  • the DNS lookup is not handled by the go-ipfix library

pkg/util/flowexport/flowexport.go Show resolved Hide resolved
return nil, err, isIPv6
func setupTestWithIPFIXCollector(tb testing.TB) (*TestData, bool, error) {
var isIPv6 bool
if clusterInfo.podV6NetworkCIDR == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

networkPolicyOnly mode does not support dual stack. That has to be handled differently.
See this: https://github.com/vmware-tanzu/antrea/blob/e92ee627c2a218150935e41a62d43ed5a03e06b4/pkg/agent/config/node_config.go#L120

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@srikartati
Copy link
Member

Please make sure that the following CI tests pass:
/test-ipv6-only-e2e
/test-ipv6-e2e

@dreamtalen dreamtalen force-pushed the local-ipv6-fa branch 2 times, most recently from 1c8979d to 74cf81e Compare February 5, 2021 23:44
@codecov-io
Copy link

codecov-io commented Feb 5, 2021

Codecov Report

Merging #1819 (003a619) into main (a9adf5a) will increase coverage by 2.46%.
The diff coverage is 58.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1819      +/-   ##
==========================================
+ Coverage   65.10%   67.57%   +2.46%     
==========================================
  Files         195      197       +2     
  Lines       16875    17217     +342     
==========================================
+ Hits        10987    11634     +647     
+ Misses       4725     4384     -341     
- Partials     1163     1199      +36     
Flag Coverage Δ
e2e-tests 48.36% <55.55%> (?)
kind-e2e-tests 56.39% <55.55%> (+0.37%) ⬆️
unit-tests 41.86% <30.55%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/flowaggregator/flowaggregator.go 63.63% <57.14%> (+6.98%) ⬆️
pkg/util/flowexport/flowexport.go 82.92% <100.00%> (ø)
pkg/apis/controlplane/sets.go 33.33% <0.00%> (-12.83%) ⬇️
pkg/controller/networkpolicy/tier.go 47.50% <0.00%> (-5.00%) ⬇️
pkg/agent/stats/collector.go 94.61% <0.00%> (-3.85%) ⬇️
pkg/ovs/openflow/ofctrl_packetout.go 88.78% <0.00%> (-3.14%) ⬇️
pkg/controller/networkpolicy/crd_utils.go 88.60% <0.00%> (-1.61%) ⬇️
...ntroller/networkpolicy/networkpolicy_controller.go 70.12% <0.00%> (-1.59%) ⬇️
pkg/controller/networkpolicy/external_entity.go
pkg/controller/grouping/controller.go 64.64% <0.00%> (ø)
... and 34 more

if err := ensureAntreaRunning(tb, testData); err != nil {
return nil, err, isIPv6
func setupTestWithIPFIXCollector(tb testing.TB) (*TestData, bool, error) {
var isIPv6 bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe you can set isIPv6 to false and turn it to true in if statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed.

@@ -462,7 +462,8 @@ func (data *TestData) deployFlowAggregator(ipfixCollector string) (string, error
}
if rc, _, _, err = provider.RunCommandOnNode(controlPlaneNodeName(), fmt.Sprintf("kubectl -n %s rollout status deployment/%s --timeout=%v", flowAggregatorNamespace, flowAggregatorDeployment, 2*defaultTimeout)); err != nil || rc != 0 {
_, stdout, _, _ := provider.RunCommandOnNode(controlPlaneNodeName(), fmt.Sprintf("kubectl -n %s describe pod", flowAggregatorNamespace))
return stdout, fmt.Errorf("error when waiting for flow aggregator rollout to complete. kubectl describe output: %v", stdout)
_, logStdout, _, _ := provider.RunCommandOnNode(controlPlaneNodeName(), fmt.Sprintf("kubectl -n %s logs -l app=flow-aggregator", flowAggregatorNamespace))
return stdout, fmt.Errorf("error when waiting for flow aggregator rollout to complete. kubectl describe output: %v, logs: %v", stdout, logStdout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use %v rather than %s, any specific purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed.

@dreamtalen dreamtalen force-pushed the local-ipv6-fa branch 6 times, most recently from d3c513c to 0ac0ab6 Compare February 9, 2021 06:39
@dreamtalen
Copy link
Contributor Author

/test-ipv6-only-e2e
/test-ipv6-e2e

@dreamtalen
Copy link
Contributor Author

/test-ipv6-only-e2e
/test-ipv6-e2e

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the impact for the collector / dashboards of using 2 different template sets (one for IPv4 and one for IPv6)?

@@ -289,6 +289,7 @@ function deliver_antrea {
docker tag "${DOCKER_REGISTRY}/antrea/sonobuoy-systemd-logs:v0.3" "sonobuoy/systemd-logs:v0.3"
fi
DOCKER_REGISTRY=${DOCKER_REGISTRY} make
DOCKER_REGISTRY=${DOCKER_REGISTRY} make flow-aggregator-ubuntu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so until now the flow-aggregator image was pulled from the docker registry (instead of building the image from the checked-out repo and using that one)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Jenkins, yes. I found this issue when debugging the Jenkins tests.

return fmt.Errorf("resolved Flow Aggregator address %v is not supported", hostIPs[0])
}
// Update the collector address with resolved IP of flow aggregator
collectorAddr = net.JoinHostPort(hostIPs[0].String(), defaultIPFIXPort)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking both addresses and picking the "correct" one should be ok. However, I am not sure the case you describe is possible. I think in order to have dual-stack Services, the Nodes need to have dual-stack networking themselves.

BTW, it's still a bit weird to me that:

  • we have a special case for strings.Contains(collectorAddr, flowAggregatorDNSName). I thought we had removed that special case.
  • the DNS lookup is not handled by the go-ipfix library

pkg/flowaggregator/flowaggregator.go Show resolved Hide resolved
if err != nil {
fa.exportingProcess.CloseConnToCollector()
fa.exportingProcess = nil
fa.templateID = 0
fa.templateIDv4 = 0
return fmt.Errorf("sending template set failed, err: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may specify which template set in the error message (IPv4 / IPv6), same comment below

@dreamtalen
Copy link
Contributor Author

what the impact for the collector / dashboards of using 2 different template sets (one for IPv4 and one for IPv6)?

Hi Antonin, thanks for your comments. Yes in this PR, we send two templates (one for IPv4, one for IPv6) regardless of the cluster configuration. This is the same method used when adding IPv6 support for flow exporter. It would be fine for the collector. However, it has issues in the e2e test in dual-stack cluster, for example, the source IP is IPv6 and destination is IPv4. I have talked with @lzhecheng and found that the current method doesn’t support dual stack for flow exporter too (the old e2e test for flow exporter didn’t test that scenario so IPv6 flow exporter PR passed all tests). So in this PR, we are focusing on the IPv6 only support of flow aggregator to catch up with flow exporter. And will add dual-stack support for flow exporter and aggregator in the future. How does this sound to you?

@lzhecheng
Copy link
Contributor

If source and destination are a mixture of IPv4 and IPv6, one template including both IP families should be used. It may introduce some extra effort to implement. So far, with 2 templates used, all IPv4 or all IPv6 traffic is supported in a dual-stack setup.

@antoninbas
Copy link
Contributor

@dreamtalen @lzhecheng why don't we just a single template from the get go? we already have fields, such as destinationServicePort which are not always applicable

Also speaking of dual-stack, how easy will it be to create dashboards to show traffic distribution regardless of whether the traffic is v4 or v6?

@dreamtalen dreamtalen force-pushed the local-ipv6-fa branch 5 times, most recently from f85671c to c1bc648 Compare February 16, 2021 18:56
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest code LGTM. Just want to make sure that @srikartati and / or @zyiou review the Kibana changes.

Copy link
Contributor

@zyiou zyiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better if you can have some screenshot on changes to Kibana dashboard. (e.g. overview and flow records dashboard)

@dreamtalen
Copy link
Contributor Author

It will be better if you can have some screenshot on changes to Kibana dashboard. (e.g. overview and flow records dashboard)

Sure, please take a look at the PR description.

srikartati
srikartati previously approved these changes Mar 13, 2021
Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

With dual stack support, we may have to add ipFamily as another filter along with podname, podnamespace filters etc. in kibana dashboard to enable users to pick one, otherwise both will be shown.

@srikartati
Copy link
Member

/test-all

1 similar comment
@dreamtalen
Copy link
Contributor Author

/test-all

@srikartati
Copy link
Member

/test-ipv6-only-e2e
/test-ipv6-e2e
/test-networkpolicy

@dreamtalen
Copy link
Contributor Author

/test-ipv6-only-e2e

1 similar comment
@srikartati
Copy link
Member

/test-ipv6-only-e2e

@dreamtalen
Copy link
Contributor Author

/test-all

@dreamtalen
Copy link
Contributor Author

/test-ipv6-only-e2e
/test-ipv6-e2e

@dreamtalen
Copy link
Contributor Author

/test-ipv6-only-e2e

Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Let's wait for PR #1959 to go in as it is a critical bug; that will also help in passing e2e tests.

@@ -304,8 +319,10 @@ func exportLogs(tb testing.TB, data *TestData, logsSubDir string, writeNodeLogs
}

func teardownFlowAggregator(tb testing.TB, data *TestData) {
if err := testData.gracefulExitFlowAggregator(testOptions.coverageDir); err != nil {
tb.Fatalf("Error when gracefully exiting Flow Aggregator: %v", err)
if testOptions.enableCoverage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this missing code as suggested, so we do not fail for tests that don't use flow aggregator coverage binary like ipv6 tests.

@dreamtalen
Copy link
Contributor Author

/test-all
/test-ipv6-only-e2e
/test-ipv6-e2e

@dreamtalen
Copy link
Contributor Author

/test-all
/test-ipv6-only-e2e
/test-ipv6-e2e

@dreamtalen
Copy link
Contributor Author

/test-e2e
/test-ipv6-only-e2e
/test-networkpolicy

In this commit, we add single stack IPv6 support in flow aggregator
 and elk flow collector.
For dual stack, there are some issues in service name retreival at
flow exporter, will add support in a new PR.
@dreamtalen
Copy link
Contributor Author

/test-e2e
/test-ipv6-only-e2e
/test-networkpolicy

@dreamtalen
Copy link
Contributor Author

/test-ipv6-only-e2e
/test-all

@srikartati
Copy link
Member

/test-ipv6-e2e

@srikartati
Copy link
Member

All the main tests seem to be passing. Will wait for dual-stack as well, because it involves IPv6 family.

@srikartati srikartati merged commit aa112c7 into antrea-io:main Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants