no data on Functions explorer dashboard with Rust app #66

lcmgh · 2023-06-28T06:17:53Z

First of all, thanks for making life easier :) I just gave it a first try and annotated a fn in my Rust app with autometrics. But in the dashboard no data is shown. Also all quantile metrics are set to 0 even the fn got some calls.

Version: autometrics = { version = "0.5.0", features = ["metrics"] }

Data from my apps Prometheus endpoint

# TYPE function_calls_duration summary
function_calls_duration{function="batch_insert",module="myapp::mymod::dao",quantile="0"} 0
function_calls_duration{function="batch_insert",module="myapp::mymod::dao",quantile="0.5"} 0
function_calls_duration{function="batch_insert",module="myapp::mymod::dao",quantile="0.9"} 0
function_calls_duration{function="batch_insert",module="myapp::mymod::dao",quantile="0.95"} 0
function_calls_duration{function="batch_insert",module="myapp::mymod::dao",quantile="0.99"} 0
function_calls_duration{function="batch_insert",module="myapp::mymod::dao",quantile="0.999"} 0
function_calls_duration{function="batch_insert",module="myapp::mymod::dao",quantile="1"} 0
function_calls_duration_sum{function="batch_insert",module="myapp::mymod::dao"} 0.01188134
function_calls_duration_count{function="batch_insert",module="myapp::mymod::dao"} 3

My data is scraped
Dashboard shows no data

The text was updated successfully, but these errors were encountered:

gagbo · 2023-06-28T09:33:08Z

Hello,

Thanks for the detailed bug report. I don't think I can take a deeper look at it before the end of the week, but a preliminary check I can do is:

It's weird that the function_calls_duration metric is a summary instead of a histogram (not your fault, and since we set the exporter using buckets here it actually might be a bug/issue in the metrics-prometheus-exporter lib we use for you here) – and to be clear, that's why the dashboard is failing; the queries are made for a histogram metric, not a summary one

Is there any specific reason you chose the metrics feature?

Can you try to use the prometheus feature instead and tell us if you have the same issue? If you don't have the issue anymore, I'll create an issue from here to autometrics-rs as it's a bug with the metrics feature. If you still have the issue, can you show the data sent by the Prometheus endpoint again?

lcmgh · 2023-06-28T11:16:09Z

Hi @gagbo! Thanks for checking back. I have used the metrics feature because my app has already a Prometheus exporter endpoint.

/// use metrics_exporter_prometheus::PrometheusBuilder;
let prom = PrometheusBuilder::new();
prom
    .install()
    .expect("failed to install recorder/exporter");

I was not sure if that feature exposes the Prometheus endpoint at a port and wanted not to have any interference here.

So switching to prometheus makes the autometrics metrics not appear anymore on my existing exporter endpoint.

After switching back to metrics it is similar to before

function_calls_duration{function="batch_insert",module="myapp::mod::dao",quantile="0"} 0
function_calls_duration{function="batch_insert",module="myapp::mod::dao",quantile="0.5"} 0
function_calls_duration{function="batch_insert",module="myapp::mod::dao",quantile="0.9"} 0
function_calls_duration{function="batch_insert",module="myapp::mod::dao",quantile="0.95"} 0
function_calls_duration{function="batch_insert",module="myapp::mod::dao",quantile="0.99"} 0
function_calls_duration{function="batch_insert",module="myapp::mod::dao",quantile="0.999"} 0
function_calls_duration{function="batch_insert",module="myapp::mod::dao",quantile="1"} 0
function_calls_duration_sum{function="batch_insert",module="myapp::mod::dao"} 0.012257653
function_calls_duration_count{function="batch_insert",module="myapp::mod::dao"} 3

gagbo · 2023-06-28T11:46:17Z

That's weird too (that using the prometheus feature only doesn't work with the existing metrics setup). Are you using the default registry of Prometheus client library, or a separate custom library?

For the record, the prometheus feautures doesn't try to expose a scrapping endpoint, the prometheus-exporter feature does. It's currently documented under " For projects already using custom Prometheus metrics " in the README, not sure what we can do yet to make that clearer.

To go back on topic, can you instead try to change your prometheus exporter initialization:

/// use metrics_exporter_prometheus::PrometheusBuilder;
let prom = PrometheusBuilder::new();
prom
+    .set_buckets(&[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0])
+    .expect("Failed to set histogram buckets")
    .install()
    .expect("failed to install recorder/exporter");

and see if it solves at least¹ the "function_calls_durations as summary" issue?

I think this might be incompatible with your existing metrics (because this apparently forces all histograms to be exported as histogram instead of summary), but if that's the case I'd rather handle that in a separate issue ↩

lcmgh · 2023-06-28T13:49:22Z

I have adapted your changes and the metric endpoint now gives

function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.005"} 2
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.01"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.025"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.05"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.075"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.1"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.25"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.5"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.75"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="1"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="2.5"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="5"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="7.5"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="10"} 3
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="+Inf"} 3
function_calls_duration_sum{function="batch_insert",module="myapp::mod::dao"} 0.013500508000000001
function_calls_duration_count{function="batch_insert",module="myapp::mod::dao"} 3

gagbo · 2023-06-28T13:57:30Z

Yeah, that's better, at least the duration related metrics should work. I still have 2 questions:

don't you see also a function_calls_count metric that is exposed?
is the function-specific dashboard showing data at least in the latency graph?

lcmgh · 2023-06-29T12:52:39Z

There is indeed a function_calls_count metric (forgot to include it in my previous posts)

# HELP function_calls_count Autometrics counter for tracking function calls
# TYPE function_calls_count counter
function_calls_count{function="batch_insert",module="myapp::mod::dao",caller="",result="ok"} 1

# HELP function_calls_duration Autometrics histogram for tracking function call duration
# TYPE function_calls_duration histogram
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.005"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.01"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.025"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.05"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.075"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.1"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.25"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.5"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="0.75"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="1"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="2.5"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="5"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="7.5"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="10"} 1
function_calls_duration_bucket{function="batch_insert",module="myapp::mod::dao",le="+Inf"} 1
function_calls_duration_sum{function="batch_insert",module="myapp::mod::dao"} 0.002298264
function_calls_duration_count{function="batch_insert",module="myapp::mod::dao"} 1

Dashboard still shows no data in all three panels

With applied code from #66 (comment).

gagbo · 2023-06-29T14:38:42Z

I see that the output of the scrapping endpoint looks more like what I'm expecting, so this part is solved.

I'm trying to understand what happened to the dashboard, but I think I'll have to test this and reproduce to understand the issue.

The fact that the dashboard proposes the batch_insert function name as input shows that at least the part extracting valid function labels is working fine (so the dashboard is well connected to the Prometheus data source, and can see the autometrics metrics)
The fact that none of the graphs show data make me think that maybe the issue is with our build_info metric and how we use it in the query to generate the graph.

Also, the sampling interval used to show the graph is dynamic, and changes with the timespan shown. Do you still have the issue if you change the "Last 6 hours" in the top-right corner to something significantly shorter like "Last 5 minutes"? I hope that it will give enough points (depending on your scrape_interval) to show a curve

lcmgh · 2023-06-30T14:23:20Z

Do you still have the issue if you change the "Last 6 hours" in the top-right corner to something significantly shorter like "Last 5 minutes"?

Oh, actually I now have found some lines in the Latency (95th and 99th Percentile) panel but still no data in the other ones. The other ones even show an error but only when going back to 7 days.

execution: found duplicate series for the match group {instance="xxxxx:10250", job="kubelet"} on the right hand-side of the operation: [{__name__="up", endpoint="https-metrics", instance="xxxx:10250", job="kubelet", metrics_path="/metrics/cadvisor", namespace="kube-system", node="xxxxxxxxxxx", service="prometheus-operator-kubelet"}, {__name__="up", endpoint="https-metrics", instance="xxxx", job="kubelet", metrics_path="/metrics/probes", namespace="kube-system", node="xxxxx", service="prometheus-operator-kubelet"}];many-to-many matching not allowed: matching labels must be unique on one side

The fact that none of the graphs show data make me think that maybe the issue is with our build_info metric and how we use it in the query to generate the graph.

build_info{commit="",version="1.1.7",branch=""} 1

gagbo · 2023-07-04T08:25:49Z

The issue might be specifically triggered by your setup then, it will be hard for me to reproduce it.

Can you try to:

open the function-level dashboard,
edit the query for the success rate (or call rate) graph
edit all the on (instance, job) clauses, and replace them with on (instance, job, metrics_path)

and see if that works? I don't understand how you can have (instance, job) pairs that are not-unique, but it looks like that's what the error message is saying.

lcmgh · 2023-07-04T09:52:10Z

and see if that works? I

Yes this made it work! The legend says {function="batch_insert", module="myapp::mod::dao", version="1.1.7"}

gagbo · 2023-07-04T10:12:18Z

Thanks for bearing with me! We always assumed that instance, job would be enough to uniquely identify scrape jobs, but it looks like that assumption was wrong :) Now we'll just need to make sure that the metrics_path label is a built-in default label and present in all versions of Prometheus that matter to our use cases. Thanks again :)

lcmgh · 2023-07-04T10:50:38Z

It's me who need to be thankful! Then I just adapt my dashboard and my team mates are ready to enjoy the new dashboard :) awesome!

lcmgh · 2023-07-04T10:54:57Z

Huh, I noticed another thing, the duration of my fn is always shown as 0. I'd expect it to be > 0 when there are data points (hovering over the yellow line also shows 0):

gagbo · 2023-07-04T11:11:27Z

Oh right, because your function calls probably always last less than the smallest bucket which is 0.005s (on average you have 0.0045 according to this comment).

So if you want to have more granularity there, you should change the buckets in your exporter config:

/// use metrics_exporter_prometheus::PrometheusBuilder;
let prom = PrometheusBuilder::new();
prom
-    .set_buckets(&[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0])
+    .set_buckets(&[0.0005, 0.001, 0.0025, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5])
    .expect("Failed to set histogram buckets")
    .install()
    .expect("failed to install recorder/exporter");

There's no hard requirement on the values in the buckets, just:

adding buckets adds data in your prometheus instance (that's why I removed the top values here, but you can put them back if you want)
if you ever want to add latency SLOs later, the latency threshold must be one of the bucket value (so if you want to create a SLO that states "99% of queries in this group must last less than 0.02s", then 0.02 must appear in the buckets)

lcmgh · 2023-07-04T11:12:40Z

Thanks for making this clear to me. In production I will have longer runtimes so this should be fine :).

lcmgh · 2023-07-18T11:39:34Z

I stumbled over another issue. Metrics data is only shown when selecting "Last 7 days" in Grafana even then there a data points for the last days/hours. For another fn call I could not working it at all.

gagbo · 2023-07-24T08:47:51Z

I stumbled over another issue. Metrics data is only shown when selecting "Last 7 days" in Grafana even then there a data points for the last days/hours. For another fn call I could not working it at all.

What is the scrape_interval of your project? Can you try to change the query in the dashboard to replace the last_over_time(build_info[$__rate_interval] bits with last_over_time(build_info[1h] (1h or anything longer than your scrape_interval really) and tell me if that works?

I'm running into a similar issue just now (haven't had time to look into that before sorry), and it might be the solution. It also looks like a commit tried to solve that kind of issue recently, but I don't know whether you tried running this version of the dashboard or if it was indeed the problem

lcmgh · 2023-07-24T15:15:19Z

My scrape interval is 1m :).

Straight using https://github.com/autometrics-dev/autometrics-shared/blob/62f4dfd0a52e23874b3925039c17c879ca7aea49/dashboards/Autometrics%20Function%20Explorer.json makes my dashboard show no data at all.

gagbo · 2023-07-25T13:28:10Z

That's too bad about this, especially if you get no data at all, there's something we didn't think of when making the queries I'd guess.

For the record, I just learnt that using $__rate_interval as we do in the dashboard is perfectly fine, if you configured the scrape interval of your Prometheus instance to match the 1m you have:

Source of the screenshot with the explanation

Other than that, I wasn't able to reproduce the issue locally, so I'll go watch on our staging if the issue lives so I can debug that hands on

EDIT: the only way I was able to reproduce it, was to have a too short timespan for the ranges in queries. So I think that configuring the scrape timeout in your grafana "Prometheus data source" is going to fix most of the issues.

Gory details of my current root cause hypothesis below:

$__rate_interval is the maximum of 4 * scrape_interval and interval (the automatic interval given the time range of the graph). When you use "Last 7 days" as the range of the graph, interval became 5 minutes in my Grafana, which is typically more than the recommended interval for 1 minute scrape interval. When you go lower than this, interval decreases and 4 * scrape_interval is supposed to be the anchor value. With the default of 15s, the default "anchor" is 1 minute, which means all the rate computation the dashboard does only has 1 point in time to guess a rate; therefore, the dashboard can't compute rates and shows "no data". (And we internally never hit that issue because we're setting the interval programmatically).

All this to say "if your Prometheus data source already has 1 minute as a scrape timeout in Grafana settings, I'm stumped again"

lcmgh · 2023-07-28T13:34:53Z

I stand corrected: The issue with no data being shown seems to be fixed with the updated query mentioned earlier this thread.

Still for one function I have a datapoint with count zero. Not sure if this is because I have forgotten to set the Prometheus buckets. The actual function call duration is around 12sec so I thought default settings are fine.

metrics endpoint output

# HELP function_calls_count Autometrics counter for tracking function calls
# TYPE function_calls_count counter
function_calls_count{function="receive_and_send_rows",module="myservice_sl_server::server",caller=""} 1

# HELP build_info Autometrics info metric for tracking software version and build details
# TYPE build_info gauge
build_info{commit="",version="0.1.0",branch=""} 1

# TYPE health gauge
health{target="Total"} 1

# HELP function_calls_duration Autometrics histogram for tracking function call duration
# TYPE function_calls_duration summary
function_calls_duration{function="receive_and_send_rows",module="myservice_sl_server::server",quantile="0"} 0
function_calls_duration{function="receive_and_send_rows",module="myservice_sl_server::server",quantile="0.5"} 0
function_calls_duration{function="receive_and_send_rows",module="myservice_sl_server::server",quantile="0.9"} 0
function_calls_duration{function="receive_and_send_rows",module="myservice_sl_server::server",quantile="0.95"} 0
function_calls_duration{function="receive_and_send_rows",module="myservice_sl_server::server",quantile="0.99"} 0
function_calls_duration{function="receive_and_send_rows",module="myservice_sl_server::server",quantile="0.999"} 0
function_calls_duration{function="receive_and_send_rows",module="myservice_sl_server::server",quantile="1"} 0
function_calls_duration_sum{function="receive_and_send_rows",module="myservice_sl_server::server"} 13.730152187
function_calls_duration_count{function="receive_and_send_rows",module="myservice_sl_server::server"} 1

Prometheus exporter setup

PrometheusBuilder::new()
    .install()
    .expect("failed to install recorder/exporter");

lcmgh · 2023-07-28T13:49:10Z

When going with the following, no data is shown but the function appears in my dropdown.

PrometheusBuilder::new()
    .set_buckets(&[
        0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0,
    ])
    .expect("Failed to set histogram buckets")
    .install()
    .expect("failed to install recorder/exporter");

# HELP function_calls_count Autometrics counter for tracking function calls
# TYPE function_calls_count counter
function_calls_count{function="read_and_stream_rows",module="myservice_sl_server::db",caller=""} 1

# TYPE health gauge
health{target="Total"} 1

# HELP build_info Autometrics info metric for tracking software version and build details
# TYPE build_info gauge
build_info{commit="",version="0.1.0",branch=""} 1

# HELP function_calls_duration Autometrics histogram for tracking function call duration
# TYPE function_calls_duration histogram
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.005"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.01"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.025"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.05"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.075"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.1"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.25"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.5"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="0.75"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="1"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="2.5"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="5"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="7.5"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="10"} 0
function_calls_duration_bucket{function="read_and_stream_rows",module="myservice_sl_server::db",le="+Inf"} 1
function_calls_duration_sum{function="read_and_stream_rows",module="myservice_sl_server::db"} 15.281207034
function_calls_duration_count{function="read_and_stream_rows",module="myservice_sl_server::db"} 1

gagbo · 2023-07-28T14:26:13Z

Still for one function I have a datapoint with count zero. Not sure if this is because I have forgotten to set the Prometheus buckets.

No, don't worry about this. To help with metrics discovery, the library initializes the counters of autometricized functions at start in Debug builds. That allows the dashboards to show data even if nothing happened.

When going with the following, no data is shown but the function appears in my dropdown.

Did you check the settings for the Prometheus source in Grafana? I still think that changing the scrape interval to 1m (or 60s if it only takes seconds) like in the screenshot above would help with the "No data" issue

lcmgh · 2023-10-26T14:24:30Z

Our Prometheus uses a scrape interval of 60s.

gagbo mentioned this issue Jul 27, 2023

Add support for Grafana autometrics-dev/am#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no data on Functions explorer dashboard with Rust app #66

no data on Functions explorer dashboard with Rust app #66

lcmgh commented Jun 28, 2023

gagbo commented Jun 28, 2023 •

edited

Loading

lcmgh commented Jun 28, 2023 •

edited

Loading

gagbo commented Jun 28, 2023

lcmgh commented Jun 28, 2023

gagbo commented Jun 28, 2023

lcmgh commented Jun 29, 2023 •

edited

Loading

gagbo commented Jun 29, 2023

lcmgh commented Jun 30, 2023 •

edited

Loading

gagbo commented Jul 4, 2023

lcmgh commented Jul 4, 2023

gagbo commented Jul 4, 2023

lcmgh commented Jul 4, 2023

lcmgh commented Jul 4, 2023 •

edited

Loading

gagbo commented Jul 4, 2023 •

edited

Loading

lcmgh commented Jul 4, 2023

lcmgh commented Jul 18, 2023

gagbo commented Jul 24, 2023

lcmgh commented Jul 24, 2023 •

edited

Loading

gagbo commented Jul 25, 2023 •

edited

Loading

lcmgh commented Jul 28, 2023 •

edited

Loading

lcmgh commented Jul 28, 2023 •

edited

Loading

gagbo commented Jul 28, 2023

lcmgh commented Oct 26, 2023

no data on Functions explorer dashboard with Rust app #66

no data on Functions explorer dashboard with Rust app #66

Comments

lcmgh commented Jun 28, 2023

gagbo commented Jun 28, 2023 • edited Loading

lcmgh commented Jun 28, 2023 • edited Loading

gagbo commented Jun 28, 2023

Footnotes

lcmgh commented Jun 28, 2023

gagbo commented Jun 28, 2023

lcmgh commented Jun 29, 2023 • edited Loading

gagbo commented Jun 29, 2023

lcmgh commented Jun 30, 2023 • edited Loading

gagbo commented Jul 4, 2023

lcmgh commented Jul 4, 2023

gagbo commented Jul 4, 2023

lcmgh commented Jul 4, 2023

lcmgh commented Jul 4, 2023 • edited Loading

gagbo commented Jul 4, 2023 • edited Loading

lcmgh commented Jul 4, 2023

lcmgh commented Jul 18, 2023

gagbo commented Jul 24, 2023

lcmgh commented Jul 24, 2023 • edited Loading

gagbo commented Jul 25, 2023 • edited Loading

lcmgh commented Jul 28, 2023 • edited Loading

lcmgh commented Jul 28, 2023 • edited Loading

gagbo commented Jul 28, 2023

lcmgh commented Oct 26, 2023

gagbo commented Jun 28, 2023 •

edited

Loading

lcmgh commented Jun 28, 2023 •

edited

Loading

lcmgh commented Jun 29, 2023 •

edited

Loading

lcmgh commented Jun 30, 2023 •

edited

Loading

lcmgh commented Jul 4, 2023 •

edited

Loading

gagbo commented Jul 4, 2023 •

edited

Loading

lcmgh commented Jul 24, 2023 •

edited

Loading

gagbo commented Jul 25, 2023 •

edited

Loading

lcmgh commented Jul 28, 2023 •

edited

Loading

lcmgh commented Jul 28, 2023 •

edited

Loading