Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Boolean Metric for Instantaneous Pipeline Health #93

Closed
excalq opened this issue Apr 20, 2023 · 3 comments · Fixed by #94
Closed

Add Boolean Metric for Instantaneous Pipeline Health #93

excalq opened this issue Apr 20, 2023 · 3 comments · Fixed by #94
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@excalq
Copy link
Contributor

excalq commented Apr 20, 2023

Currently (as far as I can tell) there is no mechanism to directly verify that a pipeline is healthy at a single instance in time. Metrics such as logstash_stats_pipeline_events_out can show a timeseries drop-off, but I'd like to have a boolean _up or _healthy metric for each pipeline (as a label). This is an issue for a PR I'll create for this.

Proposed mechanism

Logstash does not directly produce such a metric, however it does emit pipelines.[pipeline_id].reloads.last_success_timestamp and pipelines.[pipeline_id].reloads.last_failure_timestamp.

  1. If both are null, the pipeline is considered working.
  2. If last_failure_timestamp has a value, but last_success_timestamp is null, the pipeline is broken, and has been since the service started.
  3. If last_failure_timestamp > last_success_timestamp, the pipeline is broken.
  4. If last_success_timestamp > last_failure_timestamp, the pipeline is now working.

In my testing, of hot-reloading with a simple invalid pipeline syntax: output: "**** INTENTIONAL BROKEN CONFIGMAP ****" , the above works as described on Logstash 8.4.0. If there are considerations or scenarios this doesn't work for, please advise.

Proposed name

logstash_stats_pipeline_up{pipeline_id="$pipeline"}, following the existing nomenclature of logstash_info_up. Any better suggestion is welcome.

Version Compatibility

These metrics were introduced in Logstash 5.0.0: elastic/logstash#5848

Screenshots of the above scenarios:

  1. Pipeline is working

image

  1. Pipeline is broken, started broken

image

  1. Pipeline is broken, started healthy (hot reload)

image

  1. Pipeline is working, fixed from broken state (hot reload)

image

excalq added a commit to excalq/logstash-exporter that referenced this issue Apr 20, 2023
kuskoman pushed a commit to excalq/logstash-exporter that referenced this issue Apr 21, 2023
@kuskoman
Copy link
Owner

kuskoman commented Apr 21, 2023

Ah, I just realised it is different metric

So, it is possible that the pipeline will be down, but the instance will be up, right?

edit: deleted my previous comment as it made no sense

kuskoman added a commit that referenced this issue Apr 21, 2023
* Use version sort add_metrics_to_readme.sh

* #93: Adds logstash_stats_pipeline_up metrics

* Add newline to EOF

* Adds reload timestamp metrics

* Fixes #96, Last_error field isn't properly defined

* Adds timestamp metrics to test/snapshot/readme

* Remove pipeline up metric

---------

Co-authored-by: Jakub Surdej <[email protected]>
@excalq
Copy link
Contributor Author

excalq commented Apr 21, 2023

Yes, that's the idea, the instance is running fine, and even other pipelines are running, but one may be in an error state. This is what the new metric logstash_stats_pipeline_up can produce in Grafana:

image

@kuskoman
Copy link
Owner

kuskoman commented Apr 21, 2023

@excalq I want to merge it, but since it is not so obvious I will create documentation for it, as well as cover more test cases
edit: I may have some time tommorow to handle that

@kuskoman kuskoman self-assigned this Apr 21, 2023
@kuskoman kuskoman added enhancement New feature or request documentation Improvements or additions to documentation labels Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants