Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vermont does not respect flowKey/nonFlowKey configuration #112

Open
nickbroon opened this issue Jun 6, 2018 · 3 comments
Open

Vermont does not respect flowKey/nonFlowKey configuration #112

nickbroon opened this issue Jun 6, 2018 · 3 comments

Comments

@nickbroon
Copy link
Contributor

While examining the code related to flow aggregation I'm not sure I understand how flowKey/nonFlowKey interacts with the aggregation.
I would assume that any field configured as nonFlowKey should be aggregated, and those configured as flowKey not aggregated.

The FlowHashtable::aggregateFlow function that performs the actual aggregation does not appear to take the key status of field into consideration, basing the choice to aggregate on the return value of isToBeAggregated() which appears to use a fixed table of `type.id' to determine this.

flowKey/nonFlowKey appears to only be used in AggregatorBaseCfg::readNonFlowKeyRule and AggregatorBaseCfg::readFlowKeyRule to set ruleField->modifier = Rule::Field::AGGREGATE or ruleField->modifier = Rule::Field::KEEP and then ruleField->modifier is used in FlowHashtable::copyData while building a flow for consideration of inserting/aggregating into the hashtable, but AGGREGATE and KEEP are not treated any different.

I simply don't see how flowKey/nonFlowKey configuration is effecting how flows are aggregated together when flow is found in the hash table.

(Originally discussed here: #108 (comment))

@muenz
Copy link

muenz commented Jun 8, 2018

I think when the aggregators were implemented, they were not supposed to be that flexible to support arbitrary fields as flow key or non-flow key fields. I admit that the configuration is confusing, and the documentation suggests more than Vermont can provide.

In practice, however, this limitation is of little relevance. If packet header fields like IP addresses, port numbers etc. are configured for a flow record, they are always keys. If not, you would need to come up with an aggregation scheme to aggregate different IP addresses, for example. If you need this, feel free to implement it. On the other hand, attributes like packet size are typically summed up and not used as a key. So, they are always "aggregated".

Maybe it is easier to to correct the documentation of Vermont :)

@nickbroon
Copy link
Contributor Author

The problem comes when some field other than than traditional 5 tuple key fields is configured as nonFlowKey, for example in my case things like output interface or applicationID, then these are treated as flowKey, without any error/warning given, which results in drastically more flows being created than expected.
If the behaviour is to remain as it is, as well as updating the documentation I think the config system needs updated, to either remove the flowKey/nonFlowKey options and replace with something a simple list of fields that is desired to collect, or if flowKey/nonKey is to remain in the config then an error/warning printed when a field given does not match the current fixed list of nonFlowKey fields that are aggregated.
I think better yet would be to actually implement the flowKey/nonFlowKey behaviour. As you mentioned many fields don't have a meaningful aggregation semantic (unlike say packet size which is simple accumulated, or tcp flags that are or'd together), but the traditional semantic given to these (and what Cisco/Juniper Netflow/IPFIX probes do) is to simple take the value from the first packet in the flow.

@nickbroon
Copy link
Contributor Author

nickbroon commented Jan 7, 2019

The default aggregation behaviour for information elements configured as 'non-key' should be to take the value from the first packet/flow.

From RFC 6728:

For example, if a non-key field specifies an Information Element
   whose value is determined by the first packet observed within a Flow
   (which is the default rule according to [RFC5102] unless specified
   differently in the description of the Information Element), this
   field MUST be included in the resulting Flow Record if it can be
   determined from the first packet of the Flow.

That is

int BaseHashtable::isToBeAggregated(InformationElement::IeInfo& type)
BaseHashtable::isToBeAggregated() should consider configuration instead of a hard list of supported fields. (that is this function simply returns if field is configured as key or non-key). And importantly
void PacketHashtable::aggregateField(const ExpFieldData* efd, HashtableBucket* hbucket,
PacketHashtable::aggregateField() changed to have the the default behaviour of using the first field for aggregation.
After which the the change in #114 to check the configuration can be removed.

nickbroon added a commit to nickbroon/vermont that referenced this issue May 27, 2020
The implementation now respects the configuration of
flowKey/nonFlowKey for fields in the aggregator.

Default aggregation behaviour for nonFlowKey fields, as defined in
4.3.3 RFC 6728 and 5 RFC 7012, is to take the value from first
packet/sample/flow. This configuration behaviour is seen in other
ipfix/netflow implementations from Cisco/Juniper/etc

Fixes: tumi8#112
nickbroon added a commit to nickbroon/vermont that referenced this issue May 27, 2020
The implementation now respects the configuration of
flowKey/nonFlowKey for fields in the aggregator.

Default aggregation behaviour for nonFlowKey fields, as defined in
4.3.3 RFC 6728 and 5 RFC 7012, is to take the value from first
packet/sample/flow. This configuration behaviour is seen in other
ipfix/netflow implementations from Cisco/Juniper/etc

Fixes: tumi8#112
nickbroon added a commit to nickbroon/vermont that referenced this issue May 27, 2020
The implementation now respects the configuration of
flowKey/nonFlowKey for fields in the aggregator.

Default aggregation behaviour for nonFlowKey fields, as defined in
4.3.3 RFC 6728 and 5 RFC 7012, is to take the value from first
packet/sample/flow. This configuration behaviour is seen in other
ipfix/netflow implementations from Cisco/Juniper/etc

Fixes: tumi8#112
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants