Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Prometheus Block Fullness Metrics #3025

Merged
merged 14 commits into from
Feb 1, 2022

Conversation

gregorycoppola
Copy link
Contributor

@gregorycoppola gregorycoppola commented Jan 28, 2022

Motivation

We want to log a bunch of new metrics (#3024). This one adds "block fullness", which we have been talking about lately.

This supports ingestion of this data into prometheus without needing to launch a custom server.

Change

This PR logs the 5-dimensional ExecutionCost for the just seen block in the append_block function run by the followers and miners.

Testing

No testing yet! Since this is for diagnostics, we can hook it up to the prometheus monitoring system and see if it looks right.

src/monitoring/mod.rs Outdated Show resolved Hide resolved
@@ -91,6 +91,31 @@ lazy_static! {
"Total number of error logs emitted by node"
)).unwrap();

pub static ref LAST_EXECUTION_READ_COUNT: IntGauge = register_int_gauge!(opts!(
"execution_cost_read_count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these could be slightly more descriptively named for prometheus as last_block_read_count, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the variable name or the caption part?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the prometheus metric name, i.e.

    pub static ref LAST_EXECUTION_READ_COUNT: IntGauge = register_int_gauge!(opts!(
        "last_block_read_count",

Copy link
Member

@CharlieC3 CharlieC3 Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also prepend stacks_node_ to each metric name like we do for the other stacks-node metrics? It helps with search and discovery, especially when your Prometheus deployment could have hundreds of metrics from all kinds of services. Reference
Screen Shot 2022-01-31 at 3 24 23 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with stacks_node_last_block_read_count-style prefix.

@codecov
Copy link

codecov bot commented Jan 28, 2022

Codecov Report

Merging #3025 (b88d8ce) into develop (ca36b2e) will increase coverage by 0.02%.
The diff coverage is 91.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3025      +/-   ##
===========================================
+ Coverage    82.61%   82.64%   +0.02%     
===========================================
  Files          242      242              
  Lines       195667   195691      +24     
===========================================
+ Hits        161642   161720      +78     
+ Misses       34025    33971      -54     
Impacted Files Coverage Δ
src/chainstate/stacks/db/blocks.rs 88.29% <66.66%> (-0.02%) ⬇️
src/monitoring/mod.rs 71.09% <100.00%> (+1.83%) ⬆️
src/vm/ast/traits_resolver/mod.rs 94.44% <0.00%> (-3.97%) ⬇️
src/burnchains/bitcoin/network.rs 81.01% <0.00%> (-2.78%) ⬇️
src/net/atlas/download.rs 82.87% <0.00%> (-1.08%) ⬇️
src/net/p2p.rs 58.20% <0.00%> (-0.76%) ⬇️
src/util/pipe.rs 86.26% <0.00%> (-0.55%) ⬇️
src/net/inv.rs 70.35% <0.00%> (-0.41%) ⬇️
src/burnchains/bitcoin/blocks.rs 92.52% <0.00%> (-0.41%) ⬇️
src/chainstate/stacks/index/storage.rs 79.59% <0.00%> (-0.38%) ⬇️
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca36b2e...b88d8ce. Read the comment docs.

Copy link
Member

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is failing to build with the prometheus features enabled.

You can test the build locally by enabling the monitoring_prom feature specifically, or by running:

$ cargo check --all-features

@jcnelson jcnelson self-requested a review January 31, 2022 16:23
@kantai
Copy link
Member

kantai commented Feb 1, 2022

LGTM! Thanks @gregorycoppola !

execution_cost: &ExecutionCost,
block_limit: &ExecutionCost,
) {
#[cfg(feature = "monitoring_prom")]
Copy link
Member

@jcnelson jcnelson Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to just group all of these statements inside a single #[cfg(feature = "monitoring_prom")]? Something like:

#[cfg(feature = "monitoring_prom")] {
   /* do stuff here */
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this

Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a small style nit.

@@ -5002,6 +5003,10 @@ impl StacksChainState {
None,
)?;

let block_limit = clarity_tx
.block_limit()
.unwrap_or(ExecutionCost::max_value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if setting block_limit to ExecutionCost::max_value() in the case of an error makes sense...

Could we at least add a warn if there is an error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... what would it mean if block_limit() failed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would mean that a transaction was currently being evaluated. I don't think it's possible in this code path, but it'd be good to warn --

Suggested change
.unwrap_or(ExecutionCost::max_value());
.unwrap_or_else(|| {
warn!("Failed to read transaction block limit");
ExecutionCost::max_value()
});

Copy link
Member

@CharlieC3 CharlieC3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@gregorycoppola
Copy link
Contributor Author

Thanks for the reviews, everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants