Add Boolean Metric for Instantaneous Pipeline Health #93

excalq · 2023-04-20T17:22:14Z

Currently (as far as I can tell) there is no mechanism to directly verify that a pipeline is healthy at a single instance in time. Metrics such as logstash_stats_pipeline_events_out can show a timeseries drop-off, but I'd like to have a boolean _up or _healthy metric for each pipeline (as a label). This is an issue for a PR I'll create for this.

Proposed mechanism

Logstash does not directly produce such a metric, however it does emit pipelines.[pipeline_id].reloads.last_success_timestamp and pipelines.[pipeline_id].reloads.last_failure_timestamp.

If both are null, the pipeline is considered working.
If last_failure_timestamp has a value, but last_success_timestamp is null, the pipeline is broken, and has been since the service started.
If last_failure_timestamp > last_success_timestamp, the pipeline is broken.
If last_success_timestamp > last_failure_timestamp, the pipeline is now working.

In my testing, of hot-reloading with a simple invalid pipeline syntax: output: "**** INTENTIONAL BROKEN CONFIGMAP ****" , the above works as described on Logstash 8.4.0. If there are considerations or scenarios this doesn't work for, please advise.

Proposed name

logstash_stats_pipeline_up{pipeline_id="$pipeline"}, following the existing nomenclature of logstash_info_up. Any better suggestion is welcome.

Version Compatibility

These metrics were introduced in Logstash 5.0.0: elastic/logstash#5848

Screenshots of the above scenarios:

Pipeline is working

Pipeline is broken, started broken

Pipeline is broken, started healthy (hot reload)

Pipeline is working, fixed from broken state (hot reload)

The text was updated successfully, but these errors were encountered:

kuskoman · 2023-04-21T09:38:17Z

Ah, I just realised it is different metric

So, it is possible that the pipeline will be down, but the instance will be up, right?

edit: deleted my previous comment as it made no sense

* Use version sort add_metrics_to_readme.sh * #93: Adds logstash_stats_pipeline_up metrics * Add newline to EOF * Adds reload timestamp metrics * Fixes #96, Last_error field isn't properly defined * Adds timestamp metrics to test/snapshot/readme * Remove pipeline up metric --------- Co-authored-by: Jakub Surdej <[email protected]>

excalq · 2023-04-21T15:54:53Z

Yes, that's the idea, the instance is running fine, and even other pipelines are running, but one may be in an error state. This is what the new metric logstash_stats_pipeline_up can produce in Grafana:

kuskoman · 2023-04-21T16:57:15Z

@excalq I want to merge it, but since it is not so obvious I will create documentation for it, as well as cover more test cases
edit: I may have some time tommorow to handle that

excalq added a commit to excalq/logstash-exporter that referenced this issue Apr 20, 2023

kuskoman#93: Adds logstash_stats_pipeline_up metrics

dbb3cc4

excalq mentioned this issue Apr 20, 2023

Add metric for pipelines indicating if they are up #94

Merged

kuskoman pushed a commit to excalq/logstash-exporter that referenced this issue Apr 21, 2023

kuskoman#93: Adds logstash_stats_pipeline_up metrics

13c7a22

kuskoman self-assigned this Apr 21, 2023

kuskoman added enhancement New feature or request documentation Improvements or additions to documentation labels Apr 21, 2023

kuskoman closed this as completed in #94 Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Boolean Metric for Instantaneous Pipeline Health #93

Add Boolean Metric for Instantaneous Pipeline Health #93

excalq commented Apr 20, 2023 •

edited

Loading

kuskoman commented Apr 21, 2023 •

edited

Loading

excalq commented Apr 21, 2023

kuskoman commented Apr 21, 2023 •

edited

Loading

Add Boolean Metric for Instantaneous Pipeline Health #93

Add Boolean Metric for Instantaneous Pipeline Health #93

Comments

excalq commented Apr 20, 2023 • edited Loading

Proposed mechanism

Proposed name

Version Compatibility

Screenshots of the above scenarios:

kuskoman commented Apr 21, 2023 • edited Loading

excalq commented Apr 21, 2023

kuskoman commented Apr 21, 2023 • edited Loading

excalq commented Apr 20, 2023 •

edited

Loading

kuskoman commented Apr 21, 2023 •

edited

Loading

kuskoman commented Apr 21, 2023 •

edited

Loading