Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ovs meter metric #5165

Merged
merged 1 commit into from
Jul 25, 2023
Merged

Add ovs meter metric #5165

merged 1 commit into from
Jul 25, 2023

Conversation

mengdie-song
Copy link
Contributor

@mengdie-song mengdie-song commented Jun 21, 2023

We have implemented rate-limiting for packet-in messages on NetworkPolicy audit logging and Traceflow.
This change adds a metric to show the packet count which is got from meter statistics. A separate goroutine is used here to get the statistics every 30 seconds and collect the metric. The value more than 0 indicates that current rate exceeds predefined limit(100 per second).

Fixes: #5037
Signed-off-by: Mengdie Song [email protected]

@mengdie-song mengdie-song changed the title Add ovs meter metrics Add ovs meter metric Jun 21, 2023
pkg/agent/metrics/collector.go Outdated Show resolved Hide resolved
pkg/agent/metrics/collector.go Outdated Show resolved Hide resolved
pkg/agent/metrics/prometheus.go Outdated Show resolved Hide resolved
pkg/agent/metrics/prometheus.go Outdated Show resolved Hide resolved
pkg/ovs/ovsctl/ovsctl.go Outdated Show resolved Hide resolved
@luolanzone
Copy link
Contributor

luolanzone commented Jun 26, 2023

@mengdie-song please add unit test for this PR, the patch unit test coverage is quite low.
Btw, the DCO check is failed, you need to provide sign-off info.

pkg/agent/metrics/collector.go Outdated Show resolved Hide resolved
pkg/agent/openflow/meters_others.go Outdated Show resolved Hide resolved
pkg/ovs/ovsctl/ovsctl.go Outdated Show resolved Hide resolved
pkg/ovs/ovsctl/ovsctl.go Outdated Show resolved Hide resolved
pkg/ovs/ovsctl/ovsctl.go Outdated Show resolved Hide resolved
pkg/agent/metrics/collector.go Outdated Show resolved Hide resolved
pkg/ovs/ovsctl/ovsctl.go Outdated Show resolved Hide resolved
pkg/agent/metrics/collector.go Outdated Show resolved Hide resolved
@mengdie-song mengdie-song force-pushed the prometheus-ovsmeter branch 5 times, most recently from bf22d1e to 8ee19cb Compare July 4, 2023 08:34
pkg/agent/metrics/prometheus.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
@mengdie-song mengdie-song force-pushed the prometheus-ovsmeter branch 2 times, most recently from 6690e28 to 63bdabb Compare July 11, 2023 09:11
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/metrics/prometheus.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
@@ -44,6 +44,7 @@ var antreaAgentMetrics = []string{
"antrea_agent_ovs_flow_ops_error_count",
"antrea_agent_ovs_flow_ops_latency_milliseconds",
"antrea_agent_ovs_total_flow_count",
"antrea_agent_ovs_ovs_meter_packet_count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it this correct? two "ovs"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks. It seems that prometheus e2e test is disabled by default on CI. I will double check this one and update.

pkg/agent/metrics/collector.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go CI jobs are failing because of build errors

pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
pkg/agent/metrics/prometheus.go Outdated Show resolved Hide resolved
pkg/agent/metrics/prometheus.go Outdated Show resolved Hide resolved
docs/prometheus-integration.md Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
Comment on lines 200 to 206
handleMeterStatsReply := func(packetCounts map[int]int64) {
for k, v := range packetCounts {
switch k {
case 1:
metrics.OVSMeterPacketDroppedCount.WithLabelValues("PacketInMeterNetworkPolicy").Set(float64(v))
case 2:
metrics.OVSMeterPacketDroppedCount.WithLabelValues("PacketInMeterTraceflow").Set(float64(v))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we are using Prometheus metrics at all in this unit test. This handleMeterStatsReply callback function does not need to use metrics. This file is testing the pkg/ovs/openflow package, which has nothing to do with metrics. Instead you could use a map protected by a mutex, or a channel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the part of checking Prometheus metrics and only checked the packet count in the latest change. Could you help take a look at the latest version?

pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
}
b.mpReplyChsMutex.RUnlock()
b.MultipartReply(sw, mpMeterStatsReply)
time.Sleep(time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nondeterministic and may be flaky. And having the assertion in the callback can't guarantee it will be called. Even if the tested function does nothing, the test would't fail.

Perhaps you can use a map to collect the received stats and use assert.Eventually to check whether expected stats are received in a reasonable time.

packetCounts := make(map[int]int64)
checkMeterStatsReply := func(meterId int, packetCount int64) {
packetCounts[meterId] = packetCount
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs mutex or you can use:

var receivedCount1 int64
checkMeterStatsReply := func(meterID int, packetCount int64) {
        assert.Equal(t, 1, meterID)
        atomic.StoreInt64(&receivedCount1, packetCount)
}

...

assert.Eventually(t, func() bool {
         return int64(100) == atomic.LoadInt64(&receivedCount1)
}, ...

Note it's not expected to receive other meterID 2 given your code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I would like to check both meters and have added the mutex.

pkg/ovs/openflow/ofctrl_bridge.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/ofctrl_bridge_test.go Outdated Show resolved Hide resolved
tnqn
tnqn previously approved these changes Jul 21, 2023
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jul 21, 2023

/test-all
@antoninbas @luolanzone do you have other comments?

@luolanzone
Copy link
Contributor

No comment from my side, thanks.

luolanzone
luolanzone previously approved these changes Jul 21, 2023
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We have implemented rate-limiting for packet-in messages on
NetworkPolicy audit logging and Traceflow.
This change adds a metric to show the packet count which is
got from meter statistics.
A separate goroutine is used here to get the statistics every
30 seconds and collect the metric. The value more than 0 indicates
that current rate exceeds predefined limit(100 per second).

Fixes: antrea-io#5037
Signed-off-by: Mengdie Song <[email protected]>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jul 25, 2023

/test-all

@tnqn
Copy link
Member

tnqn commented Jul 25, 2023

As @antoninbas's last comment has been addressed, I'm going to merge the PR given the tight schedule. If there is any new comment after it's merged, we can follow up with new PR.

@tnqn tnqn added the action/release-note Indicates a PR that should be included in release notes. label Jul 25, 2023
@tnqn tnqn merged commit 2304804 into antrea-io:main Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more Prometheus metrics for better troubleshooting
6 participants