Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Metrics #995

Open
brennanjl opened this issue Sep 18, 2024 · 2 comments
Open

feat: Metrics #995

brennanjl opened this issue Sep 18, 2024 · 2 comments

Comments

@brennanjl
Copy link
Collaborator

A few times now, we have had users request the ability for metrics. We added a basic sort of metrics to kgw (this was to allow Truflation to perform analytics on their network and customer usage), but we also should support metrics on kwild such that node operators can ensure their nodes are running properly. This was suggested in the TSN node operator discord today by @mo-husseini-vc from Validation Cloud, who mentioned that it would be very helpful to have a metrics endpoint that can be integrated with Prometheus.

CometBFT has an endpoint for this, which definitely contains some information that would be relevant to us. Some metrics here that I think would be particularly helpful to us:

  • consensus_validator_missed_blocks
  • consensus_missing_validators
  • consensus_missing_validators_power
  • consensus_byzantine_validators_power
  • consensus_block_interval_seconds
  • consensus_rounds
  • consensus_num_txs
  • consensus_block_size_bytes
  • mempool_tx_size_bytes
  • mempool_recheck_times
  • state_block_processing_time
  • state_block_processing_time

Additionally, Mo had some suggestions on what else we could track:

  • Postgres round trip time
  • RPC request time

A few others that could obviously be helpful:

  • Simple RPC req/s
  • Events written per second (within the event store)

I don't think we can (or should) add this into v0.9, however we should backlog it for v0.10. Additionally, we should use the TSN testnet as an opportunity to get feedback from validators to see what they expect from this endpoint / what would be helpful for them.

@brennanjl
Copy link
Collaborator Author

As discussed in Slack with @outerlook, we have a need to expose the CometBFT metrics as a part of v0.9. This won't require anything more than adding some configurations to Kwil, however we should be careful how we do it, such that we can avoid making breaking changes when we add more Kwil-specific instrumentation.

Below, I have the (potential) new configurations that we can add to Kwil to enable this. With these, we can enable usage of the current CometBFT metrics endpoint, and later add new metrics for Kwil. These generally match CometBFT's config, however I removed the ability to configure namespace, since we will likely want to enforce separate namespaces for comet and Kwil

[instrumentation]
# when true, prometheus metrics are served under /metrics
prometheus = true

# listen address for prometheus metrics
prometheus_listen_addr = "tcp://0.0.0.0:26660"

# Maximum number of simultaneous connections.
# 0 - unlimited.
max_open_connections = 1

Adding Kwil-Specific Metrics

After digging into github.com/prometheus/client_golang/prometheus, it seems like it will be really easy to add Kwil-specific metrics at a later time without breaking any of this functionality. Prometheus's Go library uses a global registration system (see an example of Comet's usage here), so we can easily register new metrics as needed.

@brennanjl
Copy link
Collaborator Author

The above message regarding CometBFT metrics was merged, but keeping the issue open since we plan to add Kwil-specific metrics in v0.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant