From 53f3873ea3d21222e51383b1342a31500c75f29a Mon Sep 17 00:00:00 2001 From: Luca Joss <43531661+ljoss17@users.noreply.github.com> Date: Thu, 21 Jul 2022 16:43:29 +0200 Subject: [PATCH] Improve metrics descriptions and naming (#2409) * Improved naming and description of some telemetry metrics. Changed configuration 'tx_confirmation' default to be false * Added changelog entry * Updated guide with new telemetry changes * Renamed clear packet metrics * Updated some metric descriptions and conditional display * Fixed metric used to count cleared acknowledgments, updated latencies histogram buckets. * Clippy fix * Added new enum to TrackindId in order to track latency of packets cleared * Fixes in comments * Improved changelog entry, fixed description of wallet balance metric and updated telemetry output example in Hermes guide --- ...8-better-metrics-naming-and-description.md | 5 + config.toml | 7 +- guide/src/telemetry.md | 210 +++++++++--------- relayer/src/chain/tracking.rs | 13 ++ relayer/src/config.rs | 4 +- relayer/src/link/relay_path.rs | 55 ++--- telemetry/src/state.rs | 79 +++++-- 7 files changed, 217 insertions(+), 156 deletions(-) create mode 100644 .changelog/unreleased/improvements/ibc-telemetry/2408-better-metrics-naming-and-description.md diff --git a/.changelog/unreleased/improvements/ibc-telemetry/2408-better-metrics-naming-and-description.md b/.changelog/unreleased/improvements/ibc-telemetry/2408-better-metrics-naming-and-description.md new file mode 100644 index 0000000000..22fc1f52a5 --- /dev/null +++ b/.changelog/unreleased/improvements/ibc-telemetry/2408-better-metrics-naming-and-description.md @@ -0,0 +1,5 @@ +- Improved naming and description of some telemetry metrics and added + histogram buckets for `tx_latency` metrics. + In addition the default value of the configuration `tx_confirmation` + in Hermes `config.toml` has been changed from `true` to `false`. + ([#2408](https://github.com/informalsystems/ibc-rs/issues/2408)) \ No newline at end of file diff --git a/config.toml b/config.toml index 3bb7339e2d..1ae7c696e8 100644 --- a/config.toml +++ b/config.toml @@ -59,9 +59,10 @@ clear_on_start = true # (to any chain in this config file) have been successfully delivered. # If they have not been, and `clear_interval = 0`, then those packets are # queued up for re-submission. -# Experimental feature. Affects telemetry if set to false. -# [Default: true] -tx_confirmation = true +# Experimental feature. If set to `false`, the following telemetry metrics will be disabled: +# `ibc_acknowledgment_packets`, `ibc_receive_packets` and `ibc_timeout_packets`. +# [Default: false] +tx_confirmation = false # The REST section defines parameters for Hermes' built-in RESTful API. # https://hermes.informal.systems/rest.html diff --git a/guide/src/telemetry.md b/guide/src/telemetry.md index a2116a66f6..30d0461434 100644 --- a/guide/src/telemetry.md +++ b/guide/src/telemetry.md @@ -35,9 +35,9 @@ The following table describes the metrics currently tracked by the telemetry ser | `workers` | Number of workers per object | `i64` UpDownCounter | | `ibc_client_updates` | Number of client updates performed per client | `u64` Counter | | `ibc_client_misbehaviours` | Number of misbehaviours detected per client | `u64` Counter | -| `ibc_receive_packets` | Number of receive packets relayed per channel | `u64` Counter | -| `ibc_acknowledgment_packets` | Number of acknowledgment packets relayed per channel | `u64` Counter | -| `ibc_timeout_packets` | Number of timeout packets relayed per channel | `u64` Counter | +| `ibc_receive_packets` | Number of confirmed receive packets relayed per channel. Available if relayer runs with Tx confirmation enabled | `u64` Counter | +| `ibc_acknowledgment_packets` | Number of confirmed acknowledgment packets relayed per channel. Available if relayer runs with Tx confirmation enabled | `u64` Counter | +| `ibc_timeout_packets` | Number of confirmed timeout packets relayed per channel. Available if relayer runs with Tx confirmation enabled | `u64` Counter | | `wallet_balance` | The balance of each wallet Hermes uses per chain. Please note that when converting the balance to f64 a loss in precision might be introduced in the displayed value | `f64` ValueRecorder | | `ws_events` | How many IBC events did Hermes receive via the websocket subscription, in total since starting up, per chain. | Counter | @@ -47,10 +47,11 @@ The following table describes the metrics currently tracked by the telemetry ser | `msg_num` | How many messages Hermes submitted to a specific chain. | `u64` Counter | | `queries` | Number of queries emitted by the relayer, per chain and query type | `u64` Counter | | `query_cache_hits` | Number of cache hits for queries emitted by the relayer, per chain and query type | `u64` Counter | -| `send_packet_count` | Number of SendPacket relayed | `u64` Counter | -| `acknowledgement_count` | Number of WriteAcknowledgement relayed  | `u64` Counter | -| `cleared_count`  | Number of SendPacket relayed through ClearPendingPackets | `u64` Counter  | -| `oldest_sequence` | The sequence number of the oldest pending SendPacket. If this value is 0, it means there are no pending SendPacket | `u64` ValueRecorder | +| `send_packet_count` | Number of SendPacket events processed | `u64` Counter | +| `acknowledgement_count` | Number of WriteAcknowledgement events processed  | `u64` Counter | +| `cleared_send_packet_count`  | Number of SendPacket events processed during the initial and periodic clearing | `u64` Counter  | +| `cleared_acknowledgment_count` | Number of WriteAcknowledgement events processed during the initial and periodic clearing | `u64` Counter  | +| `oldest_sequence` | The sequence number of the oldest SendPacket event observed without its corresponding WriteAcknowledgement event. If this value is 0, it means Hermes observed a WriteAcknowledgment event for all the SendPacket events | `u64` ValueRecorder | | `oldest_timestamp` | The timestamp of the oldest sequence number in seconds | `u64` ValueRecorder | ## Integration with Prometheus @@ -65,127 +66,126 @@ see Prometheus-encoded metrics. For example, with two channels and after transferring some tokens between the chains: ```text +# HELP acknowledgement_count Number of WriteAcknowledgement events processed +# TYPE acknowledgement_count counter +acknowledgement_count{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 25 # HELP cache_hits Number of cache hits for queries emitted by the relayer, per chain and query type # TYPE cache_hits counter -cache_hits{chain="ibc-0",query_type="query_channel"} 8 -cache_hits{chain="ibc-0",query_type="query_client_state"} 15 -cache_hits{chain="ibc-0",query_type="query_connection"} 11 -cache_hits{chain="ibc-1",query_type="query_channel"} 9 -cache_hits{chain="ibc-1",query_type="query_client_state"} 16 -cache_hits{chain="ibc-1",query_type="query_connection"} 12 -cache_hits{chain="ibc-2",query_type="query_channel"} 10 -cache_hits{chain="ibc-2",query_type="query_client_state"} 17 -cache_hits{chain="ibc-2",query_type="query_connection"} 13 -# HELP ibc_acknowledgment_packets Number of acknowledgment packets relayed per channel +cache_hits{chain="ibc-0",query_type="query_channel"} 63 +cache_hits{chain="ibc-0",query_type="query_client_state"} 25 +cache_hits{chain="ibc-0",query_type="query_connection"} 19 +cache_hits{chain="ibc-1",query_type="query_channel"} 28 +cache_hits{chain="ibc-1",query_type="query_client_state"} 36 +cache_hits{chain="ibc-1",query_type="query_connection"} 29 +# HELP cleared_acknowledgment_count Number of WriteAcknowledgment events processed during the initial and periodic clearing +# TYPE cleared_acknowledgment_count counter +cleared_acknowledgment_count{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 20 +# HELP cleared_send_packet_count Number of SendPacket events processed during the initial and periodic clearing +# TYPE cleared_send_packet_count counter +cleared_send_packet_count{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 10 +# HELP ibc_acknowledgment_packets Number of confirmed acknowledgment packets relayed per channel. Available if relayer runs with Tx confirmation enabled # TYPE ibc_acknowledgment_packets counter -ibc_acknowledgment_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 0 -ibc_acknowledgment_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 0 -ibc_acknowledgment_packets{src_chain="ibc-2",src_channel="channel-1",src_port="transfer"} 0 -# HELP ibc_receive_packets Number of receive packets relayed per channel +ibc_acknowledgment_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 30 +# HELP ibc_receive_packets Number of confirmed receive packets relayed per channel. Available if relayer runs with Tx confirmation enabled # TYPE ibc_receive_packets counter -ibc_receive_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 0 -ibc_receive_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 0 -ibc_receive_packets{src_chain="ibc-2",src_channel="channel-1",src_port="transfer"} 0 -# HELP ibc_timeout_packets Number of timeout packets relayed per channel -# TYPE ibc_timeout_packets counter -ibc_timeout_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 0 -ibc_timeout_packets{src_chain="ibc-1",src_channel="channel-0",src_port="transfer"} 0 -ibc_timeout_packets{src_chain="ibc-2",src_channel="channel-1",src_port="transfer"} 0 +ibc_receive_packets{src_chain="ibc-0",src_channel="channel-0",src_port="transfer"} 25 # HELP msg_num How many messages Hermes submitted to the chain, per chain # TYPE msg_num counter -msg_num{chain="ibc-0"} 7 -msg_num{chain="ibc-1"} 8 -msg_num{chain="ibc-2"} 6 -# HELP oldest_sequence The sequence number of the oldest pending SendPacket. If this value is 0, it means there are no pending SendPacket. +msg_num{chain="ibc-0"} 48 +msg_num{chain="ibc-1"} 27 +# HELP oldest_sequence The sequence number of the oldest SendPacket event observed without its corresponding WriteAcknowledgement event. If this value is 0, it means Hermes observed a WriteAcknowledgment event for all the SendPacket events # TYPE oldest_sequence gauge -oldest_sequence{chain="ibc-0",channel="channel-0",counterparty="ibc-2",port="transfer"} 2037 -oldest_sequence{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 3167 -oldest_sequence{chain="ibc-2",channel="channel-1",counterparty="ibc-1",port="transfer"} 4954 -# HELP oldest_timestamp The timestamp of the oldest sequence number in seconds. +oldest_sequence{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 0 +# HELP oldest_timestamp The timestamp of the oldest sequence number in seconds # TYPE oldest_timestamp gauge -oldest_timestamp{chain="ibc-0",channel="channel-0",counterparty="ibc-2",port="transfer"} 1656420717 -oldest_timestamp{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 1656420719 -oldest_timestamp{chain="ibc-2",channel="channel-1",counterparty="ibc-1",port="transfer"} 1656420717 +oldest_timestamp{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 0 # HELP queries Number of queries emitted by the relayer, per chain and query type # TYPE queries counter -queries{chain="ibc-0",query_type="query_application_status"} 33 -queries{chain="ibc-0",query_type="query_blocks"} 1 -queries{chain="ibc-0",query_type="query_channel"} 14 -queries{chain="ibc-0",query_type="query_client_connections"} 9 -queries{chain="ibc-0",query_type="query_client_state"} 24 +queries{chain="ibc-0",query_type="query_application_status"} 71 +queries{chain="ibc-0",query_type="query_channel"} 1 +queries{chain="ibc-0",query_type="query_client_connections"} 1 +queries{chain="ibc-0",query_type="query_client_state"} 58 queries{chain="ibc-0",query_type="query_clients"} 1 -queries{chain="ibc-0",query_type="query_connection"} 2 -queries{chain="ibc-0",query_type="query_connection_channels"} 2 -queries{chain="ibc-0",query_type="query_consensus_state"} 29 -queries{chain="ibc-0",query_type="query_consensus_states"} 2 +queries{chain="ibc-0",query_type="query_connection"} 1 +queries{chain="ibc-0",query_type="query_connection_channels"} 1 +queries{chain="ibc-0",query_type="query_consensus_state"} 61 +queries{chain="ibc-0",query_type="query_consensus_states"} 1 queries{chain="ibc-0",query_type="query_latest_height"} 1 -queries{chain="ibc-0",query_type="query_packet_acknowledgements"} 3 -queries{chain="ibc-0",query_type="query_packet_commitments"} 6 +queries{chain="ibc-0",query_type="query_packet_acknowledgements"} 1 +queries{chain="ibc-0",query_type="query_packet_commitments"} 4 queries{chain="ibc-0",query_type="query_staking_params"} 2 -queries{chain="ibc-0",query_type="query_txs"} 188 -queries{chain="ibc-0",query_type="query_unreceived_packets"} 13 -queries{chain="ibc-1",query_type="query_application_status"} 33 -queries{chain="ibc-1",query_type="query_blocks"} 1 -queries{chain="ibc-1",query_type="query_channel"} 16 -queries{chain="ibc-1",query_type="query_client_connections"} 2 -queries{chain="ibc-1",query_type="query_client_state"} 24 +queries{chain="ibc-0",query_type="query_txs"} 32 +queries{chain="ibc-0",query_type="query_unreceived_acknowledgements"} 92 +queries{chain="ibc-1",query_type="query_application_status"} 70 +queries{chain="ibc-1",query_type="query_channel"} 51 +queries{chain="ibc-1",query_type="query_client_connections"} 1 +queries{chain="ibc-1",query_type="query_client_state"} 56 queries{chain="ibc-1",query_type="query_clients"} 1 -queries{chain="ibc-1",query_type="query_connection"} 2 -queries{chain="ibc-1",query_type="query_connection_channels"} 2 -queries{chain="ibc-1",query_type="query_consensus_state"} 29 -queries{chain="ibc-1",query_type="query_consensus_states"} 2 +queries{chain="ibc-1",query_type="query_connection"} 1 +queries{chain="ibc-1",query_type="query_connection_channels"} 1 +queries{chain="ibc-1",query_type="query_consensus_state"} 60 +queries{chain="ibc-1",query_type="query_consensus_states"} 1 queries{chain="ibc-1",query_type="query_latest_height"} 1 -queries{chain="ibc-1",query_type="query_packet_acknowledgements"} 3 -queries{chain="ibc-1",query_type="query_packet_commitments"} 6 +queries{chain="ibc-1",query_type="query_packet_acknowledgements"} 2 +queries{chain="ibc-1",query_type="query_packet_commitments"} 3 queries{chain="ibc-1",query_type="query_staking_params"} 2 -queries{chain="ibc-1",query_type="query_txs"} 190 -queries{chain="ibc-1",query_type="query_unreceived_packets"} 15 -queries{chain="ibc-2",query_type="query_application_status"} 33 -queries{chain="ibc-2",query_type="query_blocks"} 1 -queries{chain="ibc-2",query_type="query_channel"} 12 -queries{chain="ibc-2",query_type="query_client_connections"} 2 -queries{chain="ibc-2",query_type="query_client_state"} 24 -queries{chain="ibc-2",query_type="query_clients"} 1 -queries{chain="ibc-2",query_type="query_connection"} 2 -queries{chain="ibc-2",query_type="query_connection_channels"} 2 -queries{chain="ibc-2",query_type="query_consensus_state"} 29 -queries{chain="ibc-2",query_type="query_consensus_states"} 2 -queries{chain="ibc-2",query_type="query_latest_height"} 1 -queries{chain="ibc-2",query_type="query_packet_acknowledgements"} 3 -queries{chain="ibc-2",query_type="query_packet_commitments"} 6 -queries{chain="ibc-2",query_type="query_staking_params"} 2 -queries{chain="ibc-2",query_type="query_txs"} 192 -queries{chain="ibc-2",query_type="query_unreceived_packets"} 11 -# HELP send_packet_count Number of SendPacket relayed. +queries{chain="ibc-1",query_type="query_txs"} 38 +queries{chain="ibc-1",query_type="query_unreceived_packets"} 52 +# HELP send_packet_count Number of SendPacket events processed # TYPE send_packet_count counter -send_packet_count{chain="ibc-0",channel="channel-0",counterparty="ibc-2",port="transfer"} 5 -send_packet_count{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 6 -send_packet_count{chain="ibc-2",channel="channel-1",counterparty="ibc-1",port="transfer"} 7 +send_packet_count{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 25 +# HELP tx_latency_confirmed The latency for all transactions submitted & confirmed to a specific chain, i.e. the difference between the moment when Hermes received a batch of events until the corresponding transaction(s) were confirmed. Milliseconds. +# TYPE tx_latency_confirmed histogram +tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="1000"} 0 +tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="5000"} 2 +tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="9000"} 2 +tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="13000"} 2 +tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="17000"} 2 +tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="20000"} 2 +tx_latency_confirmed_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="+Inf"} 2 +tx_latency_confirmed_sum{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 7518 +tx_latency_confirmed_count{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 2 +tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="1000"} 0 +tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="5000"} 2 +tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="9000"} 2 +tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="13000"} 2 +tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="17000"} 2 +tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="20000"} 2 +tx_latency_confirmed_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="+Inf"} 2 +tx_latency_confirmed_sum{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 5291 +tx_latency_confirmed_count{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 2 # HELP tx_latency_submitted The latency for all transactions submitted to a specific chain, i.e. the difference between the moment when Hermes received a batch of events and when it submitted the corresponding transaction(s). Milliseconds. # TYPE tx_latency_submitted histogram -tx_latency_submitted_bucket{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer",le="+Inf"} 1 -tx_latency_submitted_sum{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer"} 298 -tx_latency_submitted_count{chain="ibc-0",channel="channel-1",counterparty="ibc-1",port="transfer"} 1 -tx_latency_submitted_bucket{chain="ibc-1",channel="channel-1",counterparty="ibc-2",port="transfer",le="+Inf"} 1 -tx_latency_submitted_sum{chain="ibc-1",channel="channel-1",counterparty="ibc-2",port="transfer"} 400 -tx_latency_submitted_count{chain="ibc-1",channel="channel-1",counterparty="ibc-2",port="transfer"} 1 -tx_latency_submitted_bucket{chain="ibc-2",channel="channel-0",counterparty="ibc-0",port="transfer",le="+Inf"} 1 -tx_latency_submitted_sum{chain="ibc-2",channel="channel-0",counterparty="ibc-0",port="transfer"} 301 -tx_latency_submitted_count{chain="ibc-2",channel="channel-0",counterparty="ibc-0",port="transfer"} 1 -# HELP wallet_balance The balance in each wallet that Hermes is using, per wallet, denom and chain. The amount is of unit: 10^6 * `denom` +tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="200"} 0 +tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="500"} 1 +tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="1000"} 2 +tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="2000"} 2 +tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="5000"} 2 +tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="10000"} 2 +tx_latency_submitted_bucket{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer",le="+Inf"} 2 +tx_latency_submitted_sum{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 1175 +tx_latency_submitted_count{chain="ibc-0",channel="channel-0",counterparty="ibc-1",port="transfer"} 2 +tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="200"} 1 +tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="500"} 2 +tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="1000"} 2 +tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="2000"} 2 +tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="5000"} 2 +tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="10000"} 2 +tx_latency_submitted_bucket{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer",le="+Inf"} 2 +tx_latency_submitted_sum{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 542 +tx_latency_submitted_count{chain="ibc-1",channel="channel-0",counterparty="ibc-0",port="transfer"} 2 +# HELP wallet_balance The balance of each wallet Hermes uses per chain. Please note that when converting the balance to f64 a loss in precision might be introduced in the displayed value # TYPE wallet_balance gauge -wallet_balance{account="cosmos1lfq283ph84d49hywahpngxueqsv4wgxt6x5d7z",chain="ibc-0",denom="stake"} 99981718 -wallet_balance{account="cosmos1mmkyea9pmqhlewrap0urpes2vx0r4gnz7eq5vl",chain="ibc-1",denom="stake"} 99984434 -wallet_balance{account="cosmos1pmypcnlkgzfuayzlxr3w9ykp7d0pd4lg027e46",chain="ibc-2",denom="stake"} 99977817 +wallet_balance{account="cosmos1a9emd54sp0xsw77e2pzcc2pjfp7jvtl64p64w7",chain="ibc-1",denom="stake"} 99940782 +wallet_balance{account="cosmos1s9jwwt60edxhy0ez84h3wfujerj5mgzhmasy23",chain="ibc-0",denom="stake"} 99962295 # HELP workers Number of workers per object # TYPE workers gauge -workers{type="client"} 6 -workers{type="packet"} 3 -workers{type="wallet"} 3 +workers{type="client"} 2 +workers{type="packet"} 2 +workers{type="wallet"} 2 # HELP ws_events How many IBC events did Hermes receive via the WebSocket subscription, per chain # TYPE ws_events counter -ws_events{chain="ibc-0"} 6 -ws_events{chain="ibc-1"} 8 -ws_events{chain="ibc-2"} 8 +ws_events{chain="ibc-0"} 57 +ws_events{chain="ibc-1"} 37 ``` diff --git a/relayer/src/chain/tracking.rs b/relayer/src/chain/tracking.rs index 83b7db3795..c69505d61c 100644 --- a/relayer/src/chain/tracking.rs +++ b/relayer/src/chain/tracking.rs @@ -14,6 +14,8 @@ pub enum TrackingId { /// corresponding event batch, eg. when performing actions from /// the CLI or during packet clearing. Static(&'static str), + /// Random identifier used to track latency of packet clearing. + ClearedUuid(Uuid), } impl TrackingId { @@ -26,6 +28,10 @@ impl TrackingId { pub fn new_static(s: &'static str) -> Self { Self::Static(s) } + + pub fn new_cleared_uuid() -> Self { + Self::ClearedUuid(Uuid::new_v4()) + } } impl fmt::Display for TrackingId { @@ -37,6 +43,13 @@ impl fmt::Display for TrackingId { s.fmt(f) } TrackingId::Static(s) => s.fmt(f), + TrackingId::ClearedUuid(u) => { + let mut uuid = "cleared/".to_owned(); + let mut s = u.to_string(); + s.truncate(8); + uuid.push_str(&s); + uuid.fmt(f) + } } } } diff --git a/relayer/src/config.rs b/relayer/src/config.rs index b64a8e407c..c7839acc49 100644 --- a/relayer/src/config.rs +++ b/relayer/src/config.rs @@ -51,7 +51,7 @@ pub mod default { } pub fn tx_confirmation() -> bool { - true + false } pub fn clear_packets_interval() -> u64 { @@ -155,7 +155,7 @@ impl Default for ModeConfig { enabled: true, clear_interval: default::clear_packets_interval(), clear_on_start: true, - tx_confirmation: true, + tx_confirmation: false, }, } } diff --git a/relayer/src/link/relay_path.rs b/relayer/src/link/relay_path.rs index d59896e03f..ba52242f49 100644 --- a/relayer/src/link/relay_path.rs +++ b/relayer/src/link/relay_path.rs @@ -399,7 +399,8 @@ impl RelayPath { } fn relay_pending_packets(&self, height: Option) -> Result<(), LinkError> { - let tracking_id = TrackingId::new_static("relay pending packets"); + let tracking_id = TrackingId::new_cleared_uuid(); + telemetry!(received_event_batch, tracking_id); for i in 1..=MAX_RETRIES { let cleared = self @@ -1090,7 +1091,7 @@ impl RelayPath { // Update telemetry info telemetry!({ for e in events_chunk.clone() { - self.record_cleared_send_packet_and_acknowledgment(e); + self.record_cleared_send_packet(e); } }); self.events_to_operational_data(TrackedEvents::new(events_chunk, tracking_id))?; @@ -1137,6 +1138,7 @@ impl RelayPath { &self.path_id, query_write_ack_events, ) { + telemetry!(self.record_cleared_acknowledgments(events_chunk.clone())); self.events_to_operational_data(TrackedEvents::new(events_chunk, tracking_id))?; } @@ -1762,28 +1764,32 @@ impl RelayPath { } #[cfg(feature = "telemetry")] - fn record_cleared_send_packet_and_acknowledgment(&self, event: IbcEvent) { - match event { - IbcEvent::SendPacket(send_packet_ev) => { - ibc_telemetry::global().send_packet_count( - send_packet_ev.packet.sequence.into(), - send_packet_ev.height().revision_height(), - &self.src_chain().id(), - self.src_channel_id(), - self.src_port_id(), - &self.dst_chain().id(), - ); - ibc_telemetry::global().cleared_count( - send_packet_ev.packet.sequence.into(), - send_packet_ev.height().revision_height(), - &self.src_chain().id(), - self.src_channel_id(), - self.src_port_id(), - &self.dst_chain().id(), - ); - } - IbcEvent::WriteAcknowledgement(write_ack_ev) => { - ibc_telemetry::global().acknowledgement_count( + fn record_cleared_send_packet(&self, event: IbcEvent) { + if let IbcEvent::SendPacket(send_packet_ev) = event { + ibc_telemetry::global().send_packet_count( + send_packet_ev.packet.sequence.into(), + send_packet_ev.height().revision_height(), + &self.src_chain().id(), + self.src_channel_id(), + self.src_port_id(), + &self.dst_chain().id(), + ); + ibc_telemetry::global().clear_send_packet_count( + send_packet_ev.packet.sequence.into(), + send_packet_ev.height().revision_height(), + &self.src_chain().id(), + self.src_channel_id(), + self.src_port_id(), + &self.dst_chain().id(), + ); + } + } + + #[cfg(feature = "telemetry")] + fn record_cleared_acknowledgments(&self, events: Vec) { + for e in events { + if let IbcEvent::WriteAcknowledgement(write_ack_ev) = e { + ibc_telemetry::global().clear_acknowledgment_packet_count( write_ack_ev.packet.sequence.into(), write_ack_ev.height().revision_height(), &self.dst_chain().id(), @@ -1792,7 +1798,6 @@ impl RelayPath { &self.src_chain().id(), ); } - _ => {} } } } diff --git a/telemetry/src/state.rs b/telemetry/src/state.rs index 258ece62c4..04d207f19c 100644 --- a/telemetry/src/state.rs +++ b/telemetry/src/state.rs @@ -100,8 +100,11 @@ pub struct TelemetryState { /// Counts the number of WriteAcknowledgement Hermes transfers. acknowledgement_count: Counter, - /// Counts the number of SendPacket Hermes transfers from ClearPacket. - cleared_count: Counter, + /// Counts the number of SendPacket events Hermes processes from ClearPendingPackets. + cleared_send_packet_count: Counter, + + /// Counts the number of WriteAcknowledgment events Hermes processes from ClearPendingPackets. + cleared_acknowledgment_count: Counter, /// Records the sequence number of the oldest SendPacket for which no /// WriteAcknowledgement has been received. The value is 0 if all the @@ -162,7 +165,9 @@ impl TelemetryState { KeyValue::new("src_port", src_port.to_string()), ]; - self.receive_packets.add(count, labels); + if count > 0 { + self.receive_packets.add(count, labels); + } } /// Number of acknowledgment packets relayed, per channel @@ -179,7 +184,9 @@ impl TelemetryState { KeyValue::new("src_port", src_port.to_string()), ]; - self.acknowledgment_packets.add(count, labels); + if count > 0 { + self.acknowledgment_packets.add(count, labels); + } } /// Number of timeout packets relayed, per channel @@ -196,7 +203,9 @@ impl TelemetryState { KeyValue::new("src_port", src_port.to_string()), ]; - self.timeout_packets.add(count, labels); + if count > 0 { + self.timeout_packets.add(count, labels); + } } /// Number of queries emitted by the relayer, per chain and query type @@ -351,7 +360,7 @@ impl TelemetryState { self.acknowledgement_count.add(1, labels); } - pub fn cleared_count( + pub fn clear_send_packet_count( &self, _seq_nr: u64, _height: u64, @@ -367,7 +376,26 @@ impl TelemetryState { KeyValue::new("port", port_id.to_string()), ]; - self.cleared_count.add(1, labels); + self.cleared_send_packet_count.add(1, labels); + } + + pub fn clear_acknowledgment_packet_count( + &self, + _seq_nr: u64, + _height: u64, + chain_id: &ChainId, + channel_id: &ChannelId, + port_id: &PortId, + counterparty_chain_id: &ChainId, + ) { + let labels: &[KeyValue; 4] = &[ + KeyValue::new("chain", chain_id.to_string()), + KeyValue::new("counterparty", counterparty_chain_id.to_string()), + KeyValue::new("channel", channel_id.to_string()), + KeyValue::new("port", port_id.to_string()), + ]; + + self.cleared_acknowledgment_count.add(1, labels); } pub fn record_send_history( @@ -493,9 +521,14 @@ impl AggregatorSelector for CustomAggregatorSelector { // Prometheus' supports only collector for histogram, sum, and last value aggregators. // https://docs.rs/opentelemetry-prometheus/0.10.0/src/opentelemetry_prometheus/lib.rs.html#411-418 // TODO: Once quantile sketches are supported, replace histograms with that. - // For the moment, disable histogram buckets since no values make sense for all use-cases. - "tx_latency_submitted" => Some(Arc::new(histogram(descriptor, &[]))), - "tx_latency_confirmed" => Some(Arc::new(histogram(descriptor, &[]))), + "tx_latency_submitted" => Some(Arc::new(histogram( + descriptor, + &[200.0, 500.0, 1000.0, 2000.0, 5000.0, 10000.0], + ))), + "tx_latency_confirmed" => Some(Arc::new(histogram( + descriptor, + &[1000.0, 5000.0, 9000.0, 13000.0, 17000.0, 20000.0], + ))), _ => Some(Arc::new(sum())), } } @@ -529,17 +562,17 @@ impl Default for TelemetryState { receive_packets: meter .u64_counter("ibc_receive_packets") - .with_description("Number of receive packets relayed per channel") + .with_description("Number of confirmed receive packets relayed per channel. Available if relayer runs with Tx confirmation enabled") .init(), acknowledgment_packets: meter .u64_counter("ibc_acknowledgment_packets") - .with_description("Number of acknowledgment packets relayed per channel") + .with_description("Number of confirmed acknowledgment packets relayed per channel. Available if relayer runs with Tx confirmation enabled") .init(), timeout_packets: meter .u64_counter("ibc_timeout_packets") - .with_description("Number of timeout packets relayed per channel") + .with_description("Number of confirmed timeout packets relayed per channel. Available if relayer runs with Tx confirmation enabled") .init(), queries: meter @@ -571,23 +604,27 @@ impl Default for TelemetryState { wallet_balance: meter .f64_value_recorder("wallet_balance") - .with_description("The balance of each wallet Hermes uses per chain. Please note that when converting the balance to f64 a loss in - precision might be introduced in the displayed value") + .with_description("The balance of each wallet Hermes uses per chain. Please note that when converting the balance to f64 a loss in precision might be introduced in the displayed value") .init(), send_packet_count: meter .u64_counter("send_packet_count") - .with_description("Number of SendPacket relayed") + .with_description("Number of SendPacket events processed") .init(), acknowledgement_count: meter .u64_counter("acknowledgement_count") - .with_description("Number of WriteAcknowledgement relayed") + .with_description("Number of WriteAcknowledgement events processed") + .init(), + + cleared_send_packet_count: meter + .u64_counter("cleared_send_packet_count") + .with_description("Number of SendPacket events processed during the initial and periodic clearing") .init(), - cleared_count: meter - .u64_counter("cleared_count") - .with_description("Number of SendPacket relayed through ClearPendingPackets") + cleared_acknowledgment_count: meter + .u64_counter("cleared_acknowledgment_count") + .with_description("Number of WriteAcknowledgment events processed during the initial and periodic clearing") .init(), tx_latency_submitted: meter @@ -615,7 +652,7 @@ impl Default for TelemetryState { oldest_sequence: meter .u64_value_recorder("oldest_sequence") - .with_description("The sequence number of the oldest pending SendPacket. If this value is 0, it means there are no pending SendPacket") + .with_description("The sequence number of the oldest SendPacket event observed without its corresponding WriteAcknowledgement event. If this value is 0, it means Hermes observed a WriteAcknowledgment event for all the SendPacket events") .init(), oldest_timestamp: meter