Implement regen metrics #2153

dapplion · 2021-03-09T20:40:41Z

Cache hits vs cache miss
Cache frequency usage, last usage per item

dapplion · 2021-04-10T13:00:32Z

@dapplion I think we should close this in favor of suggestion above:
add metrics to regen module

From #2285 (comment)

g11tech · 2021-06-27T13:30:07Z

@dapplion please assign

dapplion · 2021-06-27T17:22:18Z

@dapplion please assign

If you need help to design the metrics on this component reach out to me or Cayman. Be aware that the regen logic is quite complex with many entrypoints. Doing the metrics right will be much harder than the fork choice and require a good understatement of it mechanics.

g11tech · 2021-06-27T18:44:29Z

@dapplion thanks, will try to figure it out while I implement other simpler metrics. Will circle back on this once I get a priliminary understanding. 👍

g11tech · 2021-07-10T09:52:36Z

ok, i think i have a preliminary understanding of the regen module basis which I propose following metrics structure:

regenStateCacheLookupTotal: gauge<entrypointFn,callingModule>
regenStateCacheLoopkupHits: gauge<entrypointFn,callingModule>

regenCPStateCacheLookupTotal: gauge<entrypointFn,callingModule>
regenCPStateCacheLookupHits: gauge<entrypointFn,callingModule>

where entrypointFn label in ['getPreState','getCheckpointState','getState','getBlockSlotState']
callingModule label in ['validateGossipBlock','validateGossipAttestation','validateGossipAggregateAndProof','validateGossipVoluntaryExit','produceBlock','produceAttestationData', 'getProposerDuties', 'getAttesterDuties', 'getSyncCommitteeDuties',', 'onFinalized']

This will give a good slicing, dicing and rollups to figure out and debut what is happening.
grafana dashboard panel, can plot them up as stacked, and a module wise sum by (callingModule) graphs can be plotted.

@dapplion @wemeetagain let me know your thoughts on the same.

wemeetagain · 2021-07-13T15:08:47Z

I thinkthe lookup / hits is a good start.

Here's how I see the regen module, there's a few different loads in the regen module that may happen, from best to worst:

the requested state is in the cache, no further processing required
the requested state at a prior slot is in the cache, slots must be dialed forward, possibly hitting an epoch transition
a prior state is in the cache, blocks must be replayed to the requested state
a prior state is in the cache, blocks must be replayed to the requested state, then slots must be dialed forward

And then there's the error case, when the state can't be regenerated for whatever reason.

I think some other things I'd like to know, where metrics could help:

how often are we needing to reprocess slots? epochs? blocks?
- how many blocks? (not very actionable, but maybe interesting)
how long do regen tasks take?
how often are there regen errors?

dapplion self-assigned this Mar 9, 2021

wemeetagain mentioned this issue Mar 9, 2021

Make State Cache Pruning Smarter #2154

Closed

dapplion mentioned this issue Mar 28, 2021

Add state cache basic metrics #2285

Closed

dapplion changed the title ~~Implement state cache metrics~~ Implement regen metrics Apr 10, 2021

This was referenced May 2, 2021

Review Regen module #2463

Closed

Add regen queue metrics #2466

Merged

dapplion added the scope-metrics All issues with regards to the exposed metrics. label May 2, 2021

dapplion added the scope-performance Performance issue and ideas to improve performance. label Jun 11, 2021

dapplion mentioned this issue Jun 27, 2021

Metrics to add tracker #2468

Open

7 tasks

dapplion added the prio-medium Resolve this some time soon (tm). label Jun 27, 2021

dapplion assigned g11tech and unassigned dapplion Jun 27, 2021

g11tech mentioned this issue Jul 19, 2021

regen metrics reference impl #2852

Merged

dapplion closed this as completed Sep 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement regen metrics #2153

Implement regen metrics #2153

dapplion commented Mar 9, 2021

dapplion commented Apr 10, 2021 •

edited

Loading

g11tech commented Jun 27, 2021

dapplion commented Jun 27, 2021

g11tech commented Jun 27, 2021

g11tech commented Jul 10, 2021

wemeetagain commented Jul 13, 2021

Implement regen metrics #2153

Implement regen metrics #2153

Comments

dapplion commented Mar 9, 2021

dapplion commented Apr 10, 2021 • edited Loading

g11tech commented Jun 27, 2021

dapplion commented Jun 27, 2021

g11tech commented Jun 27, 2021

g11tech commented Jul 10, 2021

wemeetagain commented Jul 13, 2021

dapplion commented Apr 10, 2021 •

edited

Loading