-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change flow exporter's export expiry mechanism #2360
Conversation
5a82424
to
6f00feb
Compare
Codecov Report
@@ Coverage Diff @@
## main #2360 +/- ##
==========================================
+ Coverage 60.41% 65.97% +5.56%
==========================================
Files 283 284 +1
Lines 22455 26852 +4397
==========================================
+ Hits 13566 17716 +4150
- Misses 7462 7470 +8
- Partials 1427 1666 +239
Flags with carried forward coverage won't be shown. Click here to find out more.
|
9b62770
to
5dead68
Compare
Found failures on 1) some of the e2e tests specific to Kind cluster(fixed) 2) unit-test on windows only(fixed). Looking into those failures... |
b77a20c
to
31749f1
Compare
/test-e2e |
31749f1
to
66c7281
Compare
/test-e2e |
Deadlock is due to the access of connection map to update from exporter go routine to update the "DoneExport" flag. This was caught in scale testing. Resolved this through a temporary fix by adding same flag in record data struct. The connection and record deletion logic will be re-evaluated through PR antrea-io#2360 as it refactors the related code.
Deadlock is due to the access of connection map to update from exporter go routine to update the "DoneExport" flag. This was caught in scale testing. Resolved this through a temporary fix by adding same flag in record data struct. The connection and record deletion logic will be re-evaluated through PR antrea-io#2360 as it refactors the related code.
@heanlan did you run the benchmarks again after changing the function parameters to If there is a concern about memory usage, you can look into using the connection key instead. Build a slice of connection keys, then for every key, grab the lock again to get a pointer to the connection from the store, then prepare the set to export, release the lock and send the set. One thing to also consider is to have a max number of connections to export with each iteration. This way you pre-allocate a small amount of memory and you reduce the max memory usage: func (exp *flowExporter) sendFlowRecords() (time.Duration, error) {
currTime := time.Now()
const maxConnectionsToExport = 64
expiredConns := make([]flowexporter.Connection, 0, maxConnectionsToExport * 2)
var expireTime1, expireTime2 time.Duration
expiredConns, expireTime1 = exp.conntrackConnStore.GetExpiredConns(expiredConns, currTime, maxConnectionsToExport)
expiredConns, expireTime2 = exp.denyConnStore.GetExpiredConns(expiredConns, currTime, maxConnectionsToExport)
// Select the shorter time out among two connection stores to do the next round of export.
nextExpireTime := getMinTime(expireTime1, expireTime2)
for i := range expiredConns {
if err := exp.exportConn(&expiredConns[i]); err != nil {
klog.ErrorS(err, "Error when sending expired flow record")
return nextExpireTime, err
}
}
return nextExpireTime, nil
} you may want to try the latter one first, as it's a pretty simple change. Another real-world advantage IMO is that you hold the lock for a bounded amount of time because |
Like |
Thanks, Antonin and Srikar. Yes, the results listed in the chart above is after this change, the total memory usage of I have tried the approach that pre-allocate the slice with a fixed size. In the case of exporting 50k connections, the memory usage of |
@antoninbas Holding the lock for bounded time is good as polling routine can get hold of it quickly rather than waiting for a long time. I see a tradeoff where the export of the records may be delayed by a bit depending on polling routine runtime. You may have suggested 64 as an example--do you have any recommendation for rationale in picking this number? As an alternative, I thought about using the fraction of flows to define capacity for expired connections. This has a con of allocating a large memory when the flow number is high. |
I feel like anything around 100 is probably a good choice. Or if you really want to get fancy something like |
ebd339a
to
ffff223
Compare
ffff223
to
0780d57
Compare
0780d57
to
e07f180
Compare
e07f180
to
54cebe6
Compare
@@ -214,29 +233,29 @@ func TestConntrackConnectionStore_AddOrUpdateConn(t *testing.T) { | |||
addOrUpdateConnTests := []struct { | |||
flow flowexporter.Connection | |||
}{ | |||
{testFlow1}, // To test update part of function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the more I look at this test case, the less I understand why it is structured as it is. The code is 200+ lines long, and there is too much testcase-specific code IMO. We should break the test down into separate individual tests (possibly using subtests), and place the common code into helper functions if needed. This could be done in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, marked as a follow-up to-do.
f391387
to
3c98bf4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
Before the change, flow exporter exports records periodically. Signed-off-by: heanlan <[email protected]>
3c98bf4
to
dbd4f37
Compare
Squashing the commits /test-all |
This PR changes the flow exporter's exporting mechanism from exporting periodically to priority-based. We introduces activeExpireTime, idleExpireTime to every pqItem to track when it should be exported. We create two priority queue here, one for conntrack connection store and one for deny connection store. Flow exporter manages on top of these two priority queues to export the expired connection items based on their expire time.
BenchmarkExportConntrackConns
testing onsendFlowRecords
results comparison -After change:
Before change:
Improvements - reduce by:
Runtime improvements reasons - in the main,
ForAllFlowRecordsDo
costs 48% of runtime. As every time we poll to check for expired records, we will iterate through all the records in the record map. Now we are removing theFlowRecords
struct, and we are using priority queue to replace the record map, no long need to iterate through all items.Memory improvements reasons - in the main,
github.com/vmware/go-ipfix/pkg/entities.NewDataRecord
consumes 32% of memory,AddFlowRecordToMap
consumes 12%,sendFlowRecords
also consumes more compare to after-change version. We are getting rid of these methods by removingFlowRecords
struct.