Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

algod_ledger_round duplicated in /metrics after running fast catchup #3354

Closed
onetechnical opened this issue Dec 29, 2021 · 7 comments · Fixed by #3661
Closed

algod_ledger_round duplicated in /metrics after running fast catchup #3354

onetechnical opened this issue Dec 29, 2021 · 7 comments · Fixed by #3661
Assignees

Comments

@onetechnical
Copy link
Contributor

Subject of the issue

(As reported by another user) If you run fast catchup, the algod_ledger_round reported by the /metrics endpoint will duplicate stats.

# Before Fast Catchup
$ curl -Ss localhost:8080/metrics  | grep -v '#' | grep algod_ledger_round
algod_ledger_round{} 33081

# After Fast Catchup
$ curl -Ss localhost:8080/metrics  | grep -v '#' | grep algod_ledger_round
algod_ledger_round{} 33081
algod_ledger_round{} 18310078

$ curl -Ss localhost:8080/metrics  | grep -v '#' | grep algod_ledger_round
algod_ledger_round{} 33081
algod_ledger_round{} 18310108

$ curl -Ss localhost:8080/metrics  | grep -v '#' | grep algod_ledger_round
algod_ledger_round{} 33081
algod_ledger_round{} 18310142

Your environment

  • Software version: 3.2.2

Steps to reproduce

  1. Start a fresh non-archival node
  2. Check the /metrics endpoint for algod_ledger_round
  3. Run fast catchup and complete it
  4. Check the /metrics endpoint for algod_ledger_round and observe two instances.

Expected behaviour

Only the latter metric.

Actual behaviour

Both the former and latter metrics.

A workaround is to restart the algod process.

@KevinM2k
Copy link

KevinM2k commented Jan 19, 2022

seeing same issue for algod_ledger_transactions_total, and algod_ledger_reward_claims_total as well as the one mentioned above.

@0x4a5e1e4baab
Copy link

also seeing this issue algod_ledger_transactions_total, we now have issues in monitoring agent framework due to this duplications.

@maestroi
Copy link

maestroi commented Feb 8, 2022

+1 on above having the same issue

@macunha1
Copy link

macunha1 commented Feb 9, 2022

It seems that recent changes to Trackers introduced this bug; all these metrics are getting duplicated in the metrics tracker:

func (mt *metricsTracker) loadFromDisk(l ledgerForTracker, _ basics.Round) error {
mt.ledgerTransactionsTotal = metrics.MakeCounter(metrics.LedgerTransactionsTotal)
mt.ledgerRewardClaimsTotal = metrics.MakeCounter(metrics.LedgerRewardClaimsTotal)
mt.ledgerRound = metrics.MakeGauge(metrics.LedgerRound)
return nil
}
func (mt *metricsTracker) close() {
}
func (mt *metricsTracker) newBlock(blk bookkeeping.Block, delta ledgercore.StateDelta) {
rnd := blk.Round()
mt.ledgerRound.Set(float64(rnd), map[string]string{})
mt.ledgerTransactionsTotal.Add(float64(len(blk.Payset)), map[string]string{})
// TODO rewards: need to provide meaningful metric here.
mt.ledgerRewardClaimsTotal.Add(float64(1), map[string]string{})
}

Causing our Prometheus monitoring to throw the following error

failed to get prometheus metrics: text format parsing error in line 215: second HELP line for metric name "algod_ledger_transactions_total"

which includes the same metrics reported by @onetechnical @0x4a5e1e4baab and @KevinM2k.

> curl -s $ALGORAND_HOST/metrics | awk '/algod_ledger/&&!/^#/' | sort

algod_ledger_logic_ok{} 20316
algod_ledger_reward_claims_total{} 123
algod_ledger_reward_claims_total{} 45
algod_ledger_round{} 19142768
algod_ledger_round{} 291
algod_ledger_transactions_total{} 0
algod_ledger_transactions_total{} 1023450

@algorandskiy @tsachiherman do you have any idea what may be causing this?

@tsachiherman
Copy link
Contributor

That's very interesting. Thank you for reporting this.

I will look into that, but please don't expect a resolution in the immediate coming release.

@algorandskiy
Copy link
Contributor

What happens here:

  1. A node starts, metrics tracker's loadFromDisk method creates metrics.
  2. Fast catchup completes, ledgers reloads by calling loadFromDisk for all trackers.
  3. metrics registry ends up with two counters with the same name.

Not clear why it was not a problem before. I do not remember touching metrics logic at all.

@algorandskiy
Copy link
Contributor

I did some investigation and looks like we always had such issue. Fixed in #3661

tsachiherman pushed a commit that referenced this issue Feb 22, 2022
## Summary

Metrics counters where not cleared on close and lead to duplicate entries in metering report.

## Test Plan

Added unit test. Tested manually.

Closes #3354
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants