Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow Aggregator performance benchmarking #2188

Closed
wants to merge 1 commit into from

Conversation

srikartati
Copy link
Member

Added test for IntraFlowRecords.
Simulated 10 exporters and each of them send 1000 flow records.
Used benchmark in-built tools like cpuprofile, memprofile, benchmem, etc.,
to get the performance metrics.

@srikartati
Copy link
Member Author

vagrantÉk8s-node-worker-1:ü/antrea/pkg/flowaggregator$ go test -test.v -run=none -test.benchmem  -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/flowaggregator
cpu: Intel(R) Core(TM) i9-9980HK CPU É 2.40GHz
BenchmarkIntraNodeFlowRecords
    flowaggregator_perf_test.go:165: Num messages : 10001
BenchmarkIntraNodeFlowRecords-2   	       1	5550425283 ns/op	344179432 B/op	 7338133 allocs/op
PASS
ok  	antrea.io/antrea/pkg/flowaggregator	5.746s

Used pprof to see the bottlenecks in cpuprofile and memprofile.

Type: cpu
(pprof) top
Showing nodes accounting for 700ms, 57.38% of 1220ms total
Showing top 10 nodes out of 194
      flat  flat%   sum%        cum   cum%
     130ms 10.66% 10.66%      140ms 11.48%  syscall.Syscall
     120ms  9.84% 20.49%      350ms 28.69%  runtime.mallocgc
     100ms  8.20% 28.69%      210ms 17.21%  runtime.scanobject
      70ms  5.74% 34.43%       70ms  5.74%  runtime.findObject
      60ms  4.92% 39.34%       60ms  4.92%  runtime.heapBitsSetType
      50ms  4.10% 43.44%      460ms 37.70%  github.com/vmware/go-ipfix/pkg/entities.(*dataRecord).AddInfoElement
      50ms  4.10% 47.54%       80ms  6.56%  runtime.evacuate_faststr
      40ms  3.28% 50.82%       40ms  3.28%  runtime.futex
      40ms  3.28% 54.10%      140ms 11.48%  runtime.mapassign_faststr
      40ms  3.28% 57.38%       40ms  3.28%  runtime.nextFreeFast
Type: alloc_space
(pprof) top
Showing nodes accounting for 294.98MB, 89.65% of 329.02MB total
Dropped 38 nodes (cum <= 1.65MB)
Showing top 10 nodes out of 35
      flat  flat%   sum%        cum   cum%
  122.11MB 37.11% 37.11%   204.12MB 62.04%  github.com/vmware/go-ipfix/pkg/entities.(*dataRecord).AddInfoElement
   41.50MB 12.61% 49.73%    41.50MB 12.61%  github.com/vmware/go-ipfix/pkg/entities.NewInfoElementWithValue (inline)
   39.36MB 11.96% 61.69%   126.90MB 38.57%  github.com/vmware/go-ipfix/pkg/collector.(*CollectingProcess).handleTCPClient.func1
   25.50MB  7.75% 69.44%    25.50MB  7.75%  bytes.NewBuffer (inline)
      21MB  6.38% 75.82%    43.01MB 13.07%  github.com/vmware/go-ipfix/pkg/entities.EncodeToIEDataType
      14MB  4.26% 80.08%       14MB  4.26%  bytes.makeSlice
      11MB  3.34% 83.42%    25.01MB  7.60%  bytes.(*Buffer).grow
       8MB  2.43% 85.86%    32.01MB  9.73%  encoding/binary.Write
    7.50MB  2.28% 88.13%       33MB 10.03%  github.com/vmware/go-ipfix/pkg/intermediate.(*AggregationProcess).addFieldsForStatsAggregation
       5MB  1.52% 89.65%    85.04MB 25.85%  github.com/vmware/go-ipfix/pkg/collector.(*CollectingProcess).decodeDataSet

From above result, fixed github.com/vmware/go-ipfix/pkg/entities.(*dataRecord).AddInfoElement in go-ipfix, where there is unnecessary allocation of the object and got the below result. Please ignore the duration of test as there was extra sleep. Allocations and memory used went down.

vagrantÉk8s-node-worker-1:ü/antrea/pkg/flowaggregator$ go test -test.v -run=none -test.benchmem  -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/flowaggregator
cpu: Intel(R) Core(TM) i9-9980HK CPU É 2.40GHz
BenchmarkIntraNodeFlowRecords
    flowaggregator_perf_test.go:176: Num messages : 10001
BenchmarkIntraNodeFlowRecords-2   	       1	7517858545 ns/op	312003712 B/op	 6048136 allocs/op
PASS
ok  	antrea.io/antrea/pkg/flowaggregator	7.739s

@srikartati
Copy link
Member Author

@antoninbas Added one test, where we benchmark flow aggregator with unique Intra-Node flow records from 10 exporters. Currently, I added 1000 records per exporter.
Plan to add a test for InterNode flow records.
Please take a look at test and provide any early feedback you have.

@codecov-commenter
Copy link

codecov-commenter commented May 18, 2021

Codecov Report

Merging #2188 (2720db5) into main (113ad78) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2188      +/-   ##
==========================================
- Coverage   60.66%   60.66%   -0.01%     
==========================================
  Files         285      285              
  Lines       23032    23033       +1     
==========================================
- Hits        13973    13972       -1     
- Misses       7553     7554       +1     
- Partials     1506     1507       +1     
Flag Coverage Δ
kind-e2e-tests 48.39% <0.00%> (-0.07%) ⬇️
unit-tests 40.99% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/ipfix/ipfix_collector.go 70.00% <0.00%> (-7.78%) ⬇️
...erinformation/v1beta1/clusterinformation_client.go 50.00% <0.00%> (-8.34%) ⬇️
pkg/apiserver/storage/ram/watch.go 90.38% <0.00%> (-3.85%) ⬇️
pkg/controller/networkpolicy/store/addressgroup.go 83.01% <0.00%> (-3.78%) ⬇️
...gent/controller/noderoute/node_route_controller.go 55.61% <0.00%> (+0.28%) ⬆️
pkg/agent/flowexporter/exporter/exporter.go 80.23% <0.00%> (+0.46%) ⬆️
pkg/monitor/controller.go 29.10% <0.00%> (+1.49%) ⬆️

@antoninbas
Copy link
Contributor

I haven't done any real review yet, but my first comment would be about whether we could make this more "realistic" by having 1,000 exporters (which would correspond to 1,000 Antrea Agents), and have them export 100 records per second each.

BTW, I am not sure that the Go benchmark framework is the best suited here. I feel like we want to be evaluate the performance of the Flow Aggregator in a more isolated manner, without the impact of the test exporters / collector. What do you think?

@srikartati
Copy link
Member Author

I haven't done any real review yet, but my first comment would be about whether we could make this more "realistic" by having 1,000 exporters (which would correspond to 1,000 Antrea Agents), and have them export 100 records per second each.

Yeah, a higher number of exporters and 100 records per second sound realistic. I am currently doing 50 records per 0.25 seconds.

BTW, I am not sure that the Go benchmark framework is the best suited here. I feel like we want to be evaluate the performance of the Flow Aggregator in a more isolated manner, without the impact of the test exporters / collector. What do you think?

Yes, I thought about that. Currently, we run the benchmark for the whole application. We could do for a single goroutine using pprof labels. What do you think of this approach?

@antoninbas
Copy link
Contributor

Yes, I thought about that. Currently, we run the benchmark for the whole application. We could do for a single goroutine using pprof labels. What do you think of this approach?

Maybe, but doesn't it constrain us to a single goroutine? The aggregator starts multiple goroutines IIRC (for the collecting process, aggregating process, etc.) and in the future, we may even want to be support adding extra goroutines for the same functionality to scale horizontally. I think if you want to benchmark a specific "part" of the aggregator (e.g. collecting process), then the gotest benchmarking capabilities and the current approach overall make a lot of sense. But if we want to "scale test" the Flow Aggregator, running is as its own process and measuring resource usage this way may be better. This is also what we did for the Controller I think. We have some benchmarks for the NetworkPolicy Controller (https://github.com/antrea-io/antrea/blob/main/pkg/controller/networkpolicy/networkpolicy_controller_perf_test.go) and we ran some scale testing using the agent simulator to simulate a lot of connections and consummers.

@srikartati
Copy link
Member Author

Maybe, but doesn't it constrain us to a single goroutine? The aggregator starts multiple goroutines IIRC (for the collecting process, aggregating process, etc.) and in the future, we may even want to be support adding extra goroutines for the same functionality to scale horizontally.

Yes, we do have multiple goroutines in Flow Aggregator. I read that we could propagate labels/context from parent goroutine. I gave it a shot but found that applying pprof labels is not straightforward in the benchmark test without changing the application code.
For now, I modified the test parameters as discussed above and changed the test ending logic from the number of records to a timer that takes in test duration.

But if we want to "scale test" the Flow Aggregator, running is as its own process and measuring resource usage this way may be better. This is also what we did for the Controller I think. We have some benchmarks for the NetworkPolicy Controller (https://github.com/antrea-io/antrea/blob/main/pkg/controller/networkpolicy/networkpolicy_controller_perf_test.go) and we ran some scale testing using the agent simulator to simulate a lot of connections and consummers.

Got it. With the current benchmarking and profiling methodology, we found some more bottlenecks in go-ipfix code. We are working on fixing those issues.
For scale testing, we can take that up as a follow-up PR in the future. As you mentioned, we can have a different methodology by using the agent simulators and run the flow aggregator as the K8s deployment to focus on the resource usage of its process.

@srikartati
Copy link
Member Author

srikartati commented May 25, 2021

Made some fixes in go-ipfix to improve the performance: vmware/go-ipfix#204
Identified a couple of performance issues that involve comprehensive refactoring of the code in go-ipfix, which I believe will give more improvements: vmware/go-ipfix#201 and vmware/go-ipfix#205

Before go-ipfix fix:

vagrantÉk8s-node-worker-1:ü/antrea/pkg/flowaggregator$ go test -test.v -run=none -test.benchmem  -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/flowaggregator
cpu: Intel(R) Core(TM) i9-9980HK CPU É 2.40GHz
BenchmarkIntraNodeFlowRecords
    flowaggregator_perf_test.go:446: Num messages received: 1503966
BenchmarkIntraNodeFlowRecords-2   	       1	120001213609 ns/op	29198182520 B/op	522847721 allocs/op
PASS

After go-ipfix fix:

vagrantÉk8s-node-worker-1:ü/antrea/pkg/flowaggregator$ go test -test.v -run=none -test.benchmem  -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/flowaggregator
cpu: Intel(R) Core(TM) i9-9980HK CPU É 2.40GHz
BenchmarkIntraNodeFlowRecords
    flowaggregator_perf_test.go:446: Num messages received: 1516755
BenchmarkIntraNodeFlowRecords-2   	       1	120001712897 ns/op	22902440680 B/op	431854675 allocs/op
PASS
ok  	antrea.io/antrea/pkg/flowaggregator	120.119s

21.5% reduction in memory usage and 17.5% reduction in allocs.

[EDIT]
After more fixes:

vagrant@k8s-node-worker-1:~/antrea/pkg/flowaggregator$ go test -test.v -run=none -test.benchmem  -bench=. -count=2 -memprofile memprofile.out -cpuprofile profile.out 
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/flowaggregator
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkIntraNodeFlowRecords
    flowaggregator_perf_test.go:458: Num messages received: 1550264
BenchmarkIntraNodeFlowRecords-2   	       1	120004095225 ns/op	8876354472 B/op	332625061 allocs/op
PASS

70% reduction in memory usage, 36% reduction in allocations, and 3% increase in messages processed and received by external collector.

@zyiou zyiou added antrea/flow-visibility/test area/flow-visibility Issues or PRs related to flow visibility support in Antrea area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator labels Jun 9, 2021
@srikartati srikartati force-pushed the perf_flow_agg branch 6 times, most recently from a8252a6 to 069d93b Compare June 17, 2021 05:02
@srikartati
Copy link
Member Author

/test-ipv6-e2e

Added test for IntraFlowRecords.
Simulated 10 exporters and each of them send 1000 flow records.
Used benchmark in-built tools like cpuprofile, memprofile, benchmem etc.,
to get the performance metrics.

Signed-off-by: Srikar Tati <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator area/flow-visibility Issues or PRs related to flow visibility support in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants