Modify correlation and add stats support in aggregation process #99

srikartati · 2020-12-03T21:09:37Z

Create AggregationFlowRecord structure to store required metadata
Simplify the correlation process, so that we just store one record
Add stats support in Aggregation process.
There are issues with the present code for intra-node flows that do not need correlation. This PR fixes that.
Add tests for intra-node flows and stats support.

codecov · 2020-12-03T21:10:36Z

Codecov Report

Merging #99 (cd87dd3) into master (72176b5) will decrease coverage by 1.31%.
The diff coverage is 69.41%.

@@            Coverage Diff             @@
##           master      #99      +/-   ##
==========================================
- Coverage   80.26%   78.94%   -1.32%     
==========================================
  Files          13       13              
  Lines        1591     1729     +138     
==========================================
+ Hits         1277     1365      +88     
- Misses        212      242      +30     
- Partials      102      122      +20

Flag	Coverage Δ
integration-tests	`57.77% <25.32%> (-9.31%)`	⬇️
unit-tests	`78.36% <69.41%> (-1.27%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/intermediate/aggregate.go	`68.79% <66.23%> (-6.78%)`	⬇️
pkg/registry/registry_antrea.go	`100.00% <100.00%> (ø)`

srikartati · 2020-12-03T21:13:39Z

@zyiou
I modified the correlation part of the code to make it slightly simpler and to maintain some metadata.
I added some comments where stats can updated and maintained. PTAL.
Since you started working on the stats part, want to make sure the approach is as expected.
I can fill up the stats and timestamp code too, if you haven't made too much progress on that front.

srikartati · 2020-12-03T22:02:26Z

Please consider this as a priority in reviewing. It is an important PR for the flow aggregator process. It will be great if we can ship it with go-ipfix v0.4.0.

zyiou

Thanks for the PR. Current changes make sense to me. The only concern is that for inter-node case, if two consecutive records are from source node without receiving record from destination node, it will discard the second record from source node (only update the stats).

I can fill up the stats and timestamp code too, if you haven't made too much progress on that front.

Since the structure is changed quite a bit, I may need to start over for stats. It will be helpful if you have time to do that. I can also pick it up next week. Thanks!

zyiou · 2020-12-03T23:35:50Z

pkg/intermediate/aggregate.go

+			} else {
+				// If the record from the node is already present, update the stats
+				// and timestamps.


we will discard this record then? there would be only one data point during two intervals if we consider the visualization.

We are not discarding the record as the previous record from the same node (either source/destination) will have the same flow key fields and metadata fields except for new stats and timestamps.

Got it. we will only update existing record based on upcoming record. what if the existing record has not been exported and the upcoming record comes? will flow aggregator export one record instead of two?

Yes, exporter in flow aggregator will export one record instead of two. The most updated version of it.

zyiou · 2020-12-03T23:42:27Z

pkg/intermediate/aggregate.go

+				dummyIP := net.IP{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
+				if ipInString == dummyIP.To16().String() {
+					existingIeWithValue, _ := existingRecord.GetInfoElementWithValue(field)
+					ipInString := existingIeWithValue.Value.(net.IP).To4().String()


should be To16() right?

zyiou · 2020-12-03T23:48:28Z

pkg/intermediate/aggregate.go

+				}
+			case entities.Ipv4Address:
+				ipInString := ieWithValue.Value.(net.IP).To4().String()
+				if ipInString == "0. 0. 0. 0" {


there should be no space between 0. and 0.

zyiou · 2020-12-03T23:49:25Z

pkg/intermediate/aggregate.go

+				if ipInString == "0. 0. 0. 0" {
+					existingIeWithValue, _ := existingRecord.GetInfoElementWithValue(field)
+					ipInString := existingIeWithValue.Value.(net.IP).To4().String()
+					if ipInString != "0. 0. 0. 0" {


same as above.

zyiou · 2020-12-04T00:30:52Z

pkg/intermediate/aggregate.go

+				}
+			case entities.Ipv4Address:
+				ipInString := ieWithValue.Value.(net.IP).To4().String()
+				if ipInString == "0. 0. 0. 0" {


And == is supposed to be !=, right?

zyiou · 2020-12-04T00:32:20Z

pkg/intermediate/aggregate_test.go

@@ -112,8 +117,8 @@ func createMsgwithDataSet2() *entities.Message {
 	ie5 := entities.NewInfoElementWithValue(entities.NewInfoElement("protocolIdentifier", 4, 1, 0, 1), proto)
 	ie6 := entities.NewInfoElementWithValue(entities.NewInfoElement("sourcePodName", 101, 13, 55829, 65535), srcPod)
 	ie7 := entities.NewInfoElementWithValue(entities.NewInfoElement("destinationPodName", 103, 13, 55829, 65535), dstPod)
-	ie8 := entities.NewInfoElementWithValue(entities.NewInfoElement("destinationClusterIP", 106, 18, 55829, 4), nil)
-	ie9 := entities.NewInfoElementWithValue(entities.NewInfoElement("destinationServicePort", 107, 2, 55829, 2), nil)
+	ie8 := entities.NewInfoElementWithValue(entities.NewInfoElement("destinationClusterIPv4", 106, 18, 55829, 4), net.IP{0, 0, 0, 0})


net.IP{0,0,0,0} => bytes.NewBuffer(net.IP{0, 0, 0, 0})

zyiou · 2020-12-04T00:45:36Z

pkg/intermediate/aggregate_test.go

+	aggregationProcess.aggregateRecord(*flowKey1, record1)
+	aggregationProcess.aggregateRecord(*flowKey2, record2)


The logic problem when correlating ipv4 fields is not reflected in the unit tests. I think we should add a case that reverse the sequence of sending these two records, which will cover the case that destination record is received first.

Yes, there was a typo for the case of uint and ints in L187, it carried over.
We are missing that scenario. In addition, we are also missing tests for intra-node flows. Was planning to add them.

- Create AggregationFlowRecord structure to store required metadata - Change correlation process, so that we can store one record - Can be easily extended to maintain stats.

zyiou

Thanks for adding the stats part.
I have a question about the concept of the aggregated stats, e.g. for octetDeltaCountFromSourceNode, we are summing up all the octetDeltaCount if the record is from source node, no matter what the source node is, right? What would be the use cases for that? I thought we would aggregate based on current source node name. Just want to make sure about this. Thanks!

zyiou · 2020-12-04T21:36:46Z

pkg/intermediate/aggregate.go

+			}
+			// Update the corresponding source element in antreaStatsElement list.
+			if fillSrcStats {
+				existingIeWithValue, _ := existingRecord.GetInfoElementWithValue(antreaSourceStatsElements[i])


index i in L297 and L306 is under the assumption that provided statsElements, antreaSourceStatsElements, and antreaDestinationStatsElements should be matched respectively, right? maybe we can add the comment in the type.go

Yes, I would suggest to make statsElementList, antreaSourceStatsElements, and antreaDestinationStatsElements all independent string lists. Also it looks like there should be no duplicated elements in these three lists, otherwise the stats could be double or triple counted?

Yes this part needs to have more checks that validate the assumptions taken in the code. As Antrea is the only user, we control the input for now. In addition, code could be changed to make it more generic as well for example flow records with different templates--at that time assumptions will change.
I want to add a TODO and move on, considering the time factor. What do you say?

I'm ok with adding TODO comment.

shihhaoli · 2020-12-04T22:48:14Z

pkg/intermediate/aggregate.go

 type AggregationInput struct {
 	MessageChan     chan *entities.Message
 	WorkerNum       int
 	CorrelateFields []string
+	aggregateElements *AggregationElements


This is a private member, so callers can not pass it.
Do we need it in the public input struct?

shihhaoli · 2020-12-05T00:13:30Z

pkg/intermediate/aggregate.go

 				}
+			case entities.Unsigned16:
+				if ieWithValue.Value != uint16(0) {


Could 0 be a valid value?

Yes. For example, for pod-to-pod flow the destinationServicePort (k8s service port) is not set.

If 0 could be a valid value in some cases, do we still skip the storing when the incoming record has 0 value ? Or the 0 value actually has no useful info?

No we do not skip storing it, we store everything sent by exporter at collector. If there is a field in IPFIX template, the exporter has to send some value, otherwise collector cannot validate the corresponding IPFIX data record. 0 value of course do not have useful info, but also needed in some case where it has to be correlated with its corresponding record and filled up.

We have to use and manage multiple templates if the field has no use for specific use case for example destinationServicePortName/destinationService are not applicable for Pod-To-Pod flows. Currently, we use only one template in Antrea, the user of go-ipfix library.

I guess here the default value for unint16 type is 0, so only a non-zero value from incoming record needs to be stored.

Yes, we only need to correlate non-zero uint16 value and fill up the existing record in flowKeyRecord map.

shihhaoli · 2020-12-05T00:24:17Z

pkg/intermediate/aggregate.go

+		if ieWithValue, exist := incomingRecord.GetInfoElementWithValue(field); exist {
+			switch ieWithValue.Element.Name {
+			case "flowEndSeconds":
+				existingIeWithValue, _ := existingRecord.GetInfoElementWithValue(field)


So no need to check if the field exists in existing record?

Yes not needed as we do the check, we already check for this in addFieldsForStatsAggregation. The assumption is that all the flow records for a given flow follow the same template.

shihhaoli · 2020-12-05T01:16:28Z

pkg/intermediate/aggregate.go

+		if ieWithValue, exist := record.GetInfoElementWithValue(element); exist {
+			// Initialize the corresponding source element in antreaStatsElement list.
+			if fillSrcStats {
+				existingIeWithValue, _ := record.GetInfoElementWithValue(antreaSourceStatsElements[i])


Same question as the previous one for here.

Here as well we check in the add new fields function and presume that records come with same template.

shihhaoli · 2020-12-05T01:32:48Z

pkg/intermediate/aggregate.go

 func isRecordFromSrc(record entities.Record) bool {
+	if isRecordIntraNode(record) {
+		return false
+	}


It looks to me we maybe doing a checking on dst twice in some cases, such as src != "" && dst == ""?
How about doing something like the following?

func isRecordFromSrc(record entities.Record) bool { srcIEWithValue, exist := record.GetInfoElementWithValue("sourcePodName") if !exist || srcIEWithValue.Value == "" { return false } dstIEWithValue, exist := record.GetInfoElementWithValue("destinationPodName") if exist && dstIEWithValue.Value != "" { return false } return true }

Good suggestion. Done.

shihhaoli · 2020-12-05T01:34:31Z

pkg/intermediate/aggregate.go

+		return false
+	}
+	ieWithValue, exist := record.GetInfoElementWithValue("destinationPodName")
+	if exist && ieWithValue.Value == "" {


Should this be
if !exist || ieWithValue.Value == "" {
?

We expect destinationPodName to be present. Again we presume same template use case on both source and destination nodes.
In future, we have to extend where there are different templates for source and destination node, but we probably have to consider different cases. We need to have one template intra-node flow record, one template for inter-node source node flow record, one template for inter-node destination flow record etc. This case is not supported.

shihhaoli · 2020-12-05T01:35:00Z

pkg/intermediate/aggregate.go

+func isRecordFromDst(record entities.Record) bool {
+	if isRecordIntraNode(record) {
+		return false
+	}


Similarly here.

shihhaoli · 2020-12-05T01:46:37Z

pkg/intermediate/aggregate.go

+			return err
+		}
+		value := new(bytes.Buffer)
+		if err = util.Encode(value, binary.BigEndian, uint64(0)); err != nil {


Just curious, does the 64-bit integer here cover InfoElement ID, field length, and enterprise number?

It just initializes the value for new stat fields added for source and destination.

srikartati · 2020-12-05T04:47:59Z

Thanks for adding the stats part.
I have a question about the concept of the aggregated stats, e.g. for octetDeltaCountFromSourceNode, we are summing up all the octetDeltaCount if the record is from source node, no matter what the source node is, right? What would be the use cases for that? I thought we would aggregate based on current source node name. Just want to make sure about this. Thanks!

Let me understand when you say different source node. At any point of time there is only one source node for a given flow in the cluster.
Presumption is 5-tuple key unique to the flow. If the pod destroys and comes again on a different node.. there will be a new flow. Source port varies and IP can be different.
As we are aggregating records for same flow key, source node stays the same.
Ultimately when the record from aggregation process is exported, user need to reset the stats.

zyiou · 2020-12-05T08:35:52Z

Let me understand when you say different source node. At any point of time there is only one source node for a given flow in the cluster.
Presumption is 5-tuple key unique to the flow. If the pod destroys and comes again on a different node.. there will be a new flow. Source port varies and IP can be different.
As we are aggregating records for same flow key, source node stays the same.
Ultimately when the record from aggregation process is exported, user need to reset the stats.

Sorry I forgot the context we are aggregating under the same flow key. Then it makes sense to me. So the octetTotalCountFromSourceNode is actually total octet count from source node for this specific flow, but not total octet count for all flows whose source is current node, right?

srikartati · 2020-12-05T13:06:27Z

Let me understand when you say different source node. At any point of time there is only one source node for a given flow in the cluster.
Presumption is 5-tuple key unique to the flow. If the pod destroys and comes again on a different node.. there will be a new flow. Source port varies and IP can be different.
As we are aggregating records for same flow key, source node stays the same.
Ultimately when the record from aggregation process is exported, user need to reset the stats.

Sorry I forgot the context we are aggregating under the same flow key. Then it makes sense to me. So the octetTotalCountFromSourceNode is actually total octet count from source node for this specific flow, but not total octet count for all flows whose source is current node, right?

Yup that's correct.

srikartati

Thanks for the review, @shihhaoli

Addressed comments.

srikartati · 2020-12-06T17:02:53Z

pkg/intermediate/aggregate.go

 type AggregationInput struct {
 	MessageChan     chan *entities.Message
 	WorkerNum       int
 	CorrelateFields []string
+	aggregateElements *AggregationElements


srikartati · 2020-12-06T17:10:09Z

pkg/intermediate/aggregate.go

 				}
+			case entities.Unsigned16:
+				if ieWithValue.Value != uint16(0) {


No we do not skip storing it, we store everything sent by exporter at collector. If there is a field in IPFIX template, the exporter has to send some value, otherwise collector cannot validate the corresponding IPFIX data record. 0 value of course do not have useful info, but also needed in some case where it has to be correlated with its corresponding record and filled up.

We have to use and manage multiple templates if the field has no use for specific use case for example destinationServicePortName/destinationService are not applicable for Pod-To-Pod flows. Currently, we use only one template in Antrea, the user of go-ipfix library.

srikartati · 2020-12-06T17:13:10Z

pkg/intermediate/aggregate.go

+		if ieWithValue, exist := incomingRecord.GetInfoElementWithValue(field); exist {
+			switch ieWithValue.Element.Name {
+			case "flowEndSeconds":
+				existingIeWithValue, _ := existingRecord.GetInfoElementWithValue(field)


Yes not needed as we do the check, we already check for this in addFieldsForStatsAggregation. The assumption is that all the flow records for a given flow follow the same template.

srikartati · 2020-12-06T17:15:31Z

pkg/intermediate/aggregate.go

+			}
+			// Update the corresponding source element in antreaStatsElement list.
+			if fillSrcStats {
+				existingIeWithValue, _ := existingRecord.GetInfoElementWithValue(antreaSourceStatsElements[i])


Yes this part needs to have more checks that validate the assumptions taken in the code. As Antrea is the only user, we control the input for now. In addition, code could be changed to make it more generic as well for example flow records with different templates--at that time assumptions will change.
I want to add a TODO and move on, considering the time factor. What do you say?

srikartati · 2020-12-06T17:19:32Z

pkg/intermediate/aggregate.go

+			return err
+		}
+		value := new(bytes.Buffer)
+		if err = util.Encode(value, binary.BigEndian, uint64(0)); err != nil {


It just initializes the value for new stat fields added for source and destination.

srikartati · 2020-12-06T17:20:18Z

pkg/intermediate/aggregate.go

+		if ieWithValue, exist := record.GetInfoElementWithValue(element); exist {
+			// Initialize the corresponding source element in antreaStatsElement list.
+			if fillSrcStats {
+				existingIeWithValue, _ := record.GetInfoElementWithValue(antreaSourceStatsElements[i])


Here as well we check in the add new fields function and presume that records come with same template.

srikartati · 2020-12-06T17:20:33Z

pkg/intermediate/aggregate.go

 func isRecordFromSrc(record entities.Record) bool {
+	if isRecordIntraNode(record) {
+		return false
+	}


Good suggestion. Done.

srikartati · 2020-12-06T17:24:17Z

pkg/intermediate/aggregate.go

+		return false
+	}
+	ieWithValue, exist := record.GetInfoElementWithValue("destinationPodName")
+	if exist && ieWithValue.Value == "" {


We expect destinationPodName to be present. Again we presume same template use case on both source and destination nodes.
In future, we have to extend where there are different templates for source and destination node, but we probably have to consider different cases. We need to have one template intra-node flow record, one template for inter-node source node flow record, one template for inter-node destination flow record etc. This case is not supported.

srikartati · 2020-12-06T17:24:27Z

pkg/intermediate/aggregate.go

+func isRecordFromDst(record entities.Record) bool {
+	if isRecordIntraNode(record) {
+		return false
+	}


zyiou

LGTM. just ensure the TODO comment is added. Thanks!

zyiou · 2020-12-08T01:18:57Z

pkg/intermediate/aggregate.go

 	return &AggregationProcess{
-		make(map[FlowKey][]entities.Record),
+		make(map[FlowKey]AggregationFlowRecord),
 		sync.RWMutex{},
 		input.MessageChan,
 		input.WorkerNum,
 		make([]*worker, 0),
 		input.CorrelateFields,
+		input.AggregateElements,
 		make(chan bool),
 	}, nil


nit: it will be better to specify corresponding field names here for future maintenance.

zyiou · 2020-12-08T01:19:30Z

pkg/intermediate/aggregate.go

+			}
+			// Update the corresponding source element in antreaStatsElement list.
+			if fillSrcStats {
+				existingIeWithValue, _ := existingRecord.GetInfoElementWithValue(antreaSourceStatsElements[i])


I'm ok with adding TODO comment.

srikartati · 2020-12-08T05:07:19Z

LGTM. just ensure the TODO comment is added. Thanks!

It was added in L45. Thanks for the review.

…re#99) * Modify correlation in aggregation process - Create AggregationFlowRecord structure to store required metadata - Change correlation process, so that we can store one record - Can be easily extended to maintain stats. - Add stats support for intermediate process - Cleanup of existing unit tests and add more unit tests.

vmwclabot added the cla-not-required label Dec 3, 2020

srikartati requested review from zyiou and shihhaoli December 3, 2020 21:09

srikartati force-pushed the fix_correlation branch from 8114b08 to 9baa8eb Compare December 3, 2020 21:15

zyiou reviewed Dec 4, 2020

View reviewed changes

srikartati added 2 commits December 4, 2020 10:29

Modify correlation in aggregation process

c86d0f7

- Create AggregationFlowRecord structure to store required metadata - Change correlation process, so that we can store one record - Can be easily extended to maintain stats.

Address comments and add more unit tests

0c6473d

srikartati force-pushed the fix_correlation branch from 657aeab to 5c2996c Compare December 4, 2020 20:00

srikartati changed the title ~~Modify correlation in aggregation process~~ Modify correlation and add stats support in aggregation process Dec 4, 2020

srikartati requested a review from zyiou December 4, 2020 20:21

zyiou reviewed Dec 4, 2020

View reviewed changes

shihhaoli reviewed Dec 4, 2020

View reviewed changes

shihhaoli reviewed Dec 5, 2020

View reviewed changes

srikartati commented Dec 6, 2020

View reviewed changes

Add stats support for intermediate process

cd87dd3

srikartati force-pushed the fix_correlation branch from 5c2996c to cd87dd3 Compare December 6, 2020 18:10

srikartati requested a review from shihhaoli December 7, 2020 21:16

srikartati requested a review from zyiou December 7, 2020 21:16

zyiou approved these changes Dec 8, 2020

View reviewed changes

shihhaoli approved these changes Dec 8, 2020

View reviewed changes

srikartati merged commit acbdcfa into vmware:master Dec 8, 2020

srikartati deleted the fix_correlation branch December 8, 2020 05:23

		aggregationProcess.aggregateRecord(*flowKey1, record1)
		aggregationProcess.aggregateRecord(*flowKey2, record2)

Modify correlation and add stats support in aggregation process #99

Modify correlation and add stats support in aggregation process #99

Conversation

srikartati commented Dec 3, 2020 • edited Loading

codecov bot commented Dec 3, 2020 • edited Loading

Codecov Report

srikartati commented Dec 3, 2020

srikartati commented Dec 3, 2020 • edited Loading

zyiou left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zyiou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srikartati Dec 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shihhaoli Dec 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shihhaoli Dec 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srikartati commented Dec 5, 2020

zyiou commented Dec 5, 2020

srikartati commented Dec 5, 2020

srikartati left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srikartati Dec 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zyiou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srikartati commented Dec 8, 2020 • edited Loading

srikartati commented Dec 3, 2020 •

edited

Loading

codecov bot commented Dec 3, 2020 •

edited

Loading

srikartati commented Dec 3, 2020 •

edited

Loading

zyiou left a comment •

edited

Loading

srikartati Dec 6, 2020 •

edited

Loading

shihhaoli Dec 5, 2020 •

edited

Loading

shihhaoli Dec 5, 2020 •

edited

Loading

srikartati Dec 6, 2020 •

edited

Loading

srikartati commented Dec 8, 2020 •

edited

Loading