telemetry: Count and report the number of duplicate proposals and MsgDigestSkipTag messages received #4605

cce · 2022-09-29T02:42:49Z

Summary

Message filtering is used to reduce the number of duplicate large messages (>5000 bytes) sent between peers. After a large message is received, the node sends a filter message (protocol tag MsgDigestSkipTag) containing the message's hash to all its peers, telling them not to send them any messages with this hash. However this does not prevent a node from receiving the same large message concurrently from several peers, since it must fully receive the first large message before broadcasting the filter message.

This counts the number of duplicate filter messages received by a peer. Nodes do not track their outgoing filter messages, but receiving nodes must track the hashes, and so this wires up a counter to messageFilter.CheckDigest which "checks if the given digest already in the collection, and returns true if it was there before the call."

This also adds to agreement two new counters that count the number of times a duplicate payloadPresent event occurs, meaning the same proposal payload was received by agreement and sent to the proposalStore. There are two cases counted, for before and after proposal validation is complete.

Test Plan

Telemetry only — existing tests should pass.

…ceived

codecov · 2022-09-29T03:05:02Z

Codecov Report

Merging #4605 (d4587bc) into master (42b9533) will decrease coverage by 0.00%.
The diff coverage is 55.55%.

@@            Coverage Diff             @@
##           master    #4605      +/-   ##
==========================================
- Coverage   54.10%   54.10%   -0.01%     
==========================================
  Files         401      401              
  Lines       51642    51655      +13     
==========================================
+ Hits        27942    27949       +7     
- Misses      21345    21348       +3     
- Partials     2355     2358       +3

Impacted Files	Coverage Δ
agreement/proposal.go	`71.96% <ø> (ø)`
network/wsNetwork.go	`64.57% <0.00%> (-0.07%)`	⬇️
network/wsPeer.go	`65.50% <25.00%> (-0.53%)`	⬇️
agreement/actions.go	`71.42% <50.00%> (-0.20%)`	⬇️
agreement/demux.go	`90.68% <100.00%> (+0.13%)`	⬆️
agreement/events.go	`61.62% <100.00%> (+0.45%)`	⬆️
agreement/proposalStore.go	`100.00% <100.00%> (ø)`
crypto/merkletrie/trie.go	`66.42% <0.00%> (-2.19%)`	⬇️
crypto/merkletrie/node.go	`91.62% <0.00%> (-1.87%)`	⬇️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

icorderi · 2022-09-29T14:51:23Z

logging/telemetryspec/event.go

@@ -300,6 +300,8 @@ type PeerConnectionDetails struct {
 	Endpoint string `json:",omitempty"`
 	// MessageDelay is the avarage relative message delay. Not being used for incoming connection.
 	MessageDelay int64 `json:",omitempty"`
+	// DuplicateFilterCount is the number of times this peer has sent us a message hash to filter that it had already sent before.
+	DuplicateFilterCount int64


when are we resetting it?

This will be reset when the peer connection is closed — the counters that get sent to telemetry today are monotonically increasing, so it is the job of the analyzer to graph the rate at whatever granularity they can choose.. But for this particular event you would also need to spot the DisconnectPeer event between PeerConnections events to know when the reset occurred.

~~The PeerConnectionDetails seems like a weird spot to put this. We connect once so how would there ever be duplicate messages?~~

Nevermind, I seem to have mistaken this for a different peer connection event which is sent once when someone connects.

The idea is for this counter to be maintained for each wsPeer (as well as globally as a metrics.Counter), and reported here along with other per-peer stats like MessageDelay

ghost · 2022-09-29T15:22:11Z

network/wsPeer.go

+		// Count that this peer has sent us duplicate filter messages: this means it received the same
+		// large message concurrently from several peers, and then sent the filter message to us after
+		// each large message finished transferring.
+		duplicateNetworkFilterReceivedTotal.Inc(nil)


What is this existing duplicateNetworkFilterReceivedTotal? I wonder if that existing metric would have been useful.

This is a top level metric across all peers, so that you don't have to look at the PeerConnections events and parse it out of there to get a quick measure of how often this is happening in the network, and also makes it available to Prometheus like our other counters.

I added duplicateNetworkFilterReceivedTotal ... do you mean one of the other existing metrics?

was referring to outgoingNetworkMessageFilteredOutTotal sorry

also DuplicateNetworkMessageReceivedTotal

yeah I found duplicateNetworkMessageReceivedTotal while writing this — it is counting the number of times you receive a "de-dupe-safe" tag message more than once, where dedupSafe() is defined as vote (AV) and transaction (TX) messages. These message types are both smaller than the 5000-byte limit used for the filtering I'm counting with proposals and already have their own counter, so my new counter is very similar but covers >5000-byte "skipped" messages like proposals.

outgoingNetworkMessageFilteredOutTotal is counting the number of times the filter successfully worked in preventing a duplicate proposal from being sent to a peer, so this is complementary to my new counter which is basically counting the number of times it didn't work. (Because the peer did not send the skip/filter message in time before the payload started sending)

winder

The metrics/telemetry integration looks correct.

brianolson

LGTM

Count and report the number of duplicate MsgDigestSkipTag messages re…

2251afc

…ceived

cce added the Enhancement label Sep 29, 2022

icorderi reviewed Sep 29, 2022

View reviewed changes

ghost reviewed Sep 29, 2022

View reviewed changes

add proposalAlreadyFilledCounter and proposalAlreadyAssembledCounter

d4587bc

cce changed the title ~~telemetry: Count and report the number of duplicate MsgDigestSkipTag messages received~~ telemetry: Count and report the number of duplicate proposals and MsgDigestSkipTag messages received Sep 29, 2022

cce requested review from brianolson and winder September 29, 2022 17:57

winder approved these changes Sep 30, 2022

View reviewed changes

brianolson approved these changes Sep 30, 2022

View reviewed changes

onetechnical merged commit b4fecd5 into algorand:master Sep 30, 2022

onetechnical mentioned this pull request Sep 30, 2022

go-algorand 3.10.0-beta Release PR #4612

Merged

Algo-devops-service mentioned this pull request Sep 30, 2022

go-algorand 3.10.0-stable Release PR #4618

Merged

cce deleted the duplicate-filter-message-count branch March 1, 2023 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

telemetry: Count and report the number of duplicate proposals and MsgDigestSkipTag messages received #4605

telemetry: Count and report the number of duplicate proposals and MsgDigestSkipTag messages received #4605

cce commented Sep 29, 2022 •

edited

Loading

codecov bot commented Sep 29, 2022 •

edited

Loading

icorderi Sep 29, 2022

cce Sep 29, 2022

winder Sep 30, 2022 •

edited

Loading

cce Sep 30, 2022 •

edited

Loading

ghost Sep 29, 2022

cce Sep 29, 2022

cce Sep 29, 2022

ghost Sep 30, 2022

ghost Sep 30, 2022

cce Sep 30, 2022 •

edited

Loading

winder left a comment

brianolson left a comment

telemetry: Count and report the number of duplicate proposals and MsgDigestSkipTag messages received #4605

telemetry: Count and report the number of duplicate proposals and MsgDigestSkipTag messages received #4605

Conversation

cce commented Sep 29, 2022 • edited Loading

Summary

Test Plan

codecov bot commented Sep 29, 2022 • edited Loading

Codecov Report

icorderi Sep 29, 2022

Choose a reason for hiding this comment

cce Sep 29, 2022

Choose a reason for hiding this comment

winder Sep 30, 2022 • edited Loading

Choose a reason for hiding this comment

cce Sep 30, 2022 • edited Loading

Choose a reason for hiding this comment

ghost Sep 29, 2022

Choose a reason for hiding this comment

cce Sep 29, 2022

Choose a reason for hiding this comment

cce Sep 29, 2022

Choose a reason for hiding this comment

ghost Sep 30, 2022

Choose a reason for hiding this comment

ghost Sep 30, 2022

Choose a reason for hiding this comment

cce Sep 30, 2022 • edited Loading

Choose a reason for hiding this comment

winder left a comment

Choose a reason for hiding this comment

brianolson left a comment

Choose a reason for hiding this comment

cce commented Sep 29, 2022 •

edited

Loading

codecov bot commented Sep 29, 2022 •

edited

Loading

winder Sep 30, 2022 •

edited

Loading

cce Sep 30, 2022 •

edited

Loading

cce Sep 30, 2022 •

edited

Loading