Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect & report metrics #183

Open
2 tasks
ch1bo opened this issue Jan 30, 2022 · 3 comments
Open
2 tasks

Collect & report metrics #183

ch1bo opened this issue Jan 30, 2022 · 3 comments
Labels
💭 idea An idea or feature request

Comments

@ch1bo
Copy link
Collaborator

ch1bo commented Jan 30, 2022

What & Why

To measure success (or failure) of the Hydra Head project and improve continuously, we need to know how many Hydra Heads are opened, how long they are used, how many UTXOs are moved into / out of a Head etc. Most of this information is publicly available and can be derived by observing the main-chain. The remainder (e.g. transactions sizes & number of UTXOs in a Head), will be collected from within the hydra-node and will be opt-out once we reach mainnet maturity.

TBD

  • Detail what we want to collect
  • Scope out reporting infrastructure
  • Is this a also useful to implement watch-tower functionality and thus make custodial Hydra Heads more "trustworthy" when they provide this telemetry to their users (or watchtowers)?

Tasks

  • Chain observer tracking Head transactions and aggregating Head information
  • Explorer service using Head information
@ch1bo ch1bo added the 💬 feature A feature on our roadmap label Jan 30, 2022
@ch1bo ch1bo added this to the Testnet maturity milestone Jan 30, 2022
@ch1bo ch1bo added the green 💚 Low complexity or well understood feature label Feb 3, 2022
@abailly-iohk
Copy link
Contributor

With a stateless "chain observer" available, we could host a simple "Hydra Head Explorer" service online that would show and track the state of heads running on some chain?

@ch1bo ch1bo removed this from the Testnet maturity milestone Mar 8, 2022
@abailly-iohk
Copy link
Contributor

Couple of basic ideas:

  • What are interesting metrics to collect off-chain?
  • We already publish prometheus metrics inside the hydra-node, we could simply add a sidecar that scrapes it and send data to a public grafana cloud instance
  • Other part could be handled by observing the chain

@ch1bo ch1bo added this to the 0.5.0 milestone Apr 19, 2022
@abailly-iohk
Copy link
Contributor

abailly-iohk commented Apr 26, 2022

I have setup and used jaeger and zipkin in the past, including inside Haskell apps and having a way to track the processing of user requests across a distributed system is invaluable to understand its behaviour.

Looking at https://github.com/ethercrow/opentelemetry-haskell which provides support for traces. Someone pointed me at https://opentelemetry.io/docs/concepts/data-collection/ which provides a conceptual framework for all kind of "observability" data collection. In particular, opentelemetry (used to be called openjaeger) defines some standards to provide interoperability between various kind of services, allowing for example to collect and export Prometheus metrics, logs and traces to some other service.

We currently expose the following metrics in the node:

  • number of events
  • number of requested txs
  • number of confirmed txs
  • tx confirmation time histogram

Handling and possibly tuning of snapshots size is important for the protocol so we should add:

  • number of snapshots
  • number of tx/snapshot
  • snapshot confirmation time

Also:

  • event queue length, to track possible congestions/loopholes
  • system-level resources (CPU, RAM, Network traffic)
  • number UTxO in internal ledger

Traces could be an interesting addition to analyse the trace generated by a NewTx coming from a client and how it spreads across the network until the transaction becomes confirmed. This would be helpful in particular to understand the behaviour of the network if/when we move away from fully connected network to something more dynamic or less densely connected, with routing between the nodes. Not sure if it's worthwhile to do it now though.

Tasks for this feature:

  • setup a central collection host/system with authenticated access
  • deploy an opentelemetry sidecar instead of a prometheus server within hydra stack
  • configure opentelemetry to send metrics to central host (with certificate)
  • opting out simply means not deploying the sidecar

@ch1bo ch1bo removed this from the 0.5.0 milestone Apr 26, 2022
@ch1bo ch1bo added the help wanted Issues where we could need some help label May 3, 2022
@ch1bo ch1bo added 💭 idea An idea or feature request and removed 💬 feature A feature on our roadmap help wanted Issues where we could need some help green 💚 Low complexity or well understood feature labels Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💭 idea An idea or feature request
Projects
None yet
Development

No branches or pull requests

2 participants