Vermont does not respect flowKey/nonFlowKey configuration #112

nickbroon · 2018-06-06T08:57:30Z

While examining the code related to flow aggregation I'm not sure I understand how flowKey/nonFlowKey interacts with the aggregation.
I would assume that any field configured as nonFlowKey should be aggregated, and those configured as flowKey not aggregated.

The FlowHashtable::aggregateFlow function that performs the actual aggregation does not appear to take the key status of field into consideration, basing the choice to aggregate on the return value of isToBeAggregated() which appears to use a fixed table of `type.id' to determine this.

flowKey/nonFlowKey appears to only be used in AggregatorBaseCfg::readNonFlowKeyRule and AggregatorBaseCfg::readFlowKeyRule to set ruleField->modifier = Rule::Field::AGGREGATE or ruleField->modifier = Rule::Field::KEEP and then ruleField->modifier is used in FlowHashtable::copyData while building a flow for consideration of inserting/aggregating into the hashtable, but AGGREGATE and KEEP are not treated any different.

I simply don't see how flowKey/nonFlowKey configuration is effecting how flows are aggregated together when flow is found in the hash table.

(Originally discussed here: #108 (comment))

muenz · 2018-06-08T18:27:01Z

I think when the aggregators were implemented, they were not supposed to be that flexible to support arbitrary fields as flow key or non-flow key fields. I admit that the configuration is confusing, and the documentation suggests more than Vermont can provide.

In practice, however, this limitation is of little relevance. If packet header fields like IP addresses, port numbers etc. are configured for a flow record, they are always keys. If not, you would need to come up with an aggregation scheme to aggregate different IP addresses, for example. If you need this, feel free to implement it. On the other hand, attributes like packet size are typically summed up and not used as a key. So, they are always "aggregated".

Maybe it is easier to to correct the documentation of Vermont :)

nickbroon · 2018-06-09T08:32:07Z

The problem comes when some field other than than traditional 5 tuple key fields is configured as nonFlowKey, for example in my case things like output interface or applicationID, then these are treated as flowKey, without any error/warning given, which results in drastically more flows being created than expected.
If the behaviour is to remain as it is, as well as updating the documentation I think the config system needs updated, to either remove the flowKey/nonFlowKey options and replace with something a simple list of fields that is desired to collect, or if flowKey/nonKey is to remain in the config then an error/warning printed when a field given does not match the current fixed list of nonFlowKey fields that are aggregated.
I think better yet would be to actually implement the flowKey/nonFlowKey behaviour. As you mentioned many fields don't have a meaningful aggregation semantic (unlike say packet size which is simple accumulated, or tcp flags that are or'd together), but the traditional semantic given to these (and what Cisco/Juniper Netflow/IPFIX probes do) is to simple take the value from the first packet in the flow.

nickbroon · 2019-01-07T15:24:35Z

The default aggregation behaviour for information elements configured as 'non-key' should be to take the value from the first packet/flow.

From RFC 6728:

For example, if a non-key field specifies an Information Element
   whose value is determined by the first packet observed within a Flow
   (which is the default rule according to [RFC5102] unless specified
   differently in the description of the Information Element), this
   field MUST be included in the resulting Flow Record if it can be
   determined from the first packet of the Flow.

That is

vermont/src/modules/ipfix/aggregator/BaseHashtable.cpp

Line 329 in 26d4864

int BaseHashtable::isToBeAggregated(InformationElement::IeInfo& type)

BaseHashtable::isToBeAggregated() should consider configuration instead of a hard list of supported fields. (that is this function simply returns if field is configured as key or non-key). And importantly

vermont/src/modules/ipfix/aggregator/PacketHashtable.cpp

Line 1124 in aceda69

    
           void PacketHashtable::aggregateField(const ExpFieldData* efd, HashtableBucket* hbucket,

PacketHashtable::aggregateField() changed to have the the default behaviour of using the first field for aggregation.
After which the the change in #114 to check the configuration can be removed.

The implementation now respects the configuration of flowKey/nonFlowKey for fields in the aggregator. Default aggregation behaviour for nonFlowKey fields, as defined in 4.3.3 RFC 6728 and 5 RFC 7012, is to take the value from first packet/sample/flow. This configuration behaviour is seen in other ipfix/netflow implementations from Cisco/Juniper/etc Fixes: tumi8#112

nickbroon mentioned this issue Sep 10, 2018

Error message when key/nonkey configuration doesn't match actual behaviour #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vermont does not respect flowKey/nonFlowKey configuration #112

Vermont does not respect flowKey/nonFlowKey configuration #112

nickbroon commented Jun 6, 2018

muenz commented Jun 8, 2018

nickbroon commented Jun 9, 2018

nickbroon commented Jan 7, 2019 •

edited

Loading

Vermont does not respect flowKey/nonFlowKey configuration #112

Vermont does not respect flowKey/nonFlowKey configuration #112

Comments

nickbroon commented Jun 6, 2018

muenz commented Jun 8, 2018

nickbroon commented Jun 9, 2018

nickbroon commented Jan 7, 2019 • edited Loading

nickbroon commented Jan 7, 2019 •

edited

Loading