-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix throughput/octetDeltaCount issue in flow visualization #2089
Conversation
@antoninbas Please have a quick check whether the fix works for your setup. It works for me locally. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it fixes the issue with flows not being reported at all. However, if I use connections that last 30 seconds, and keep the export interval at 60 seconds everywhere, all the flow records still have delta counts set to 0, which in turn means that the throughput graphs are still all empty.
This doesn't seem right. I don't know if this is something that should be fixed by including non-zero delta counts in the first flow record for a connection, or if this is something that needs to be fixed on the ELK configuration side. @srikartati
Codecov Report
@@ Coverage Diff @@
## main #2089 +/- ##
==========================================
- Coverage 60.97% 60.94% -0.03%
==========================================
Files 270 270
Lines 20366 20365 -1
==========================================
- Hits 12418 12412 -6
- Misses 6649 6670 +21
+ Partials 1299 1283 -16
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Currently delta count of first record will always be 0. Reason is that for some connections established before flow exporter gets deployed, the bytes in conntrack table may be quite large, making the first throughput a peak in the diagram. |
For connections whose duration is lower than the What do you think? |
I don't know if there is anything to change on the Antrea side. I do not want to report non-zero delta counts if it's going to create other issues, e.g. with bandwidth reporting, or if it is not in agreement with the IPFIX specs. It seems to me that maybe this should be fixed on the ELK configuration side: maybe for these "throughput" graphs showing the cumulative amount of traffic, we should fallback to the total amount if the delta count is 0? |
I checked RFC again and could not find any relevant specification specific to this scenario. Yes even with change on the flow exporter side with Bandwidth computation for flows with only one record using delta count and time interval of records, which is the current method of computation, does not make sense. Therefore, thinking of providing a solution from the ELK flow collector side is a good alternative. I can think of two options:
@zyiou I feel the name |
I agree that the name should be changed. |
In theory, we can do the difference of the total count of the last flow record and first flow record in the last five mins. Instead of changing too much, we can do a mix of the total count and delta counts, i.e., use total count for the very first flow record and delta count for the rest. Till now we are ignoring the bytes that come from the first flow record, which leads to inaccurate cumulative byte count. Do you agree? Can this logic be managed in logstash? |
IsPresent field of connection should be updated before having IsConnectionDying check otherwise IsConnectionDying will always return true, which makes existing connections cannot be updated and octetDeltaCount always return 0. This commit also changes the delta count of first record from zero to its total delta count, modifies throughput calculation of first record in logstash config and changes names of thoughput diagram from 'throughput' to 'cumulative bytes'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the Kibana changes (kibana.ndjson and KIBANA_DEFAULTAPPID
value)?
did you test the change in your cluster?
@@ -165,7 +165,7 @@ spec: | |||
- name: ELASTICSEARCH_URL | |||
value: "http://elasticsearch:9200" | |||
- name: KIBANA_DEFAULTAPPID | |||
value: "dashboard/653cf1e0-2fd2-11e7-99ed-49759aed30f5" | |||
value: "dashboard/3b331b30-b987-11ea-b16e-fb06687c3589" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as Antonin. Why is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found that dashboard cannot be found
error popping up after importing the Kibana ndjson file.
Found that |
/test-all |
/test-ipv6-only-e2e |
IsPresent field of connection should be updated before having IsConnectionDying check otherwise IsConnectionDying will always return true, which makes existing connections cannot be updated and octetDeltaCount always return 0. This commit also changes the delta count of first record from zero to its total delta count, modifies throughput calculation of first record in logstash config and changes names of thoughput diagram from 'throughput' to 'cumulative bytes'.
IsPresent field of connection should be updated before having IsConnectionDying check otherwise IsConnectionDying will always return true, which makes existing connections cannot be updated and octetDeltaCount always return 0. This commit also changes the delta count of first record from zero to its total delta count, modifies throughput calculation of first record in logstash config and changes names of thoughput diagram from 'throughput' to 'cumulative bytes'.
IsPresent
field of connection should be updated before havingIsConnectionDying
check otherwiseIsConnectionDying
will alwaysreturn true, which makes existing connections cannot be updated and
octetDeltaCount
always return 0.This commit also changes the delta count of first record from zero
to total delta count, modifies throughput calculation of first
record in logstash config and changes names of thoughput diagram
from
throughput
tocumulative bytes
.fixes #2085