-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Boolean Metric for Instantaneous Pipeline Health #93
Labels
Comments
excalq
added a commit
to excalq/logstash-exporter
that referenced
this issue
Apr 20, 2023
kuskoman
pushed a commit
to excalq/logstash-exporter
that referenced
this issue
Apr 21, 2023
Ah, I just realised it is different metric So, it is possible that the pipeline will be down, but the instance will be up, right? edit: deleted my previous comment as it made no sense |
kuskoman
added a commit
that referenced
this issue
Apr 21, 2023
* Use version sort add_metrics_to_readme.sh * #93: Adds logstash_stats_pipeline_up metrics * Add newline to EOF * Adds reload timestamp metrics * Fixes #96, Last_error field isn't properly defined * Adds timestamp metrics to test/snapshot/readme * Remove pipeline up metric --------- Co-authored-by: Jakub Surdej <[email protected]>
@excalq I want to merge it, but since it is not so obvious I will create documentation for it, as well as cover more test cases |
kuskoman
added
enhancement
New feature or request
documentation
Improvements or additions to documentation
labels
Apr 21, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Currently (as far as I can tell) there is no mechanism to directly verify that a pipeline is healthy at a single instance in time. Metrics such as
logstash_stats_pipeline_events_out
can show a timeseries drop-off, but I'd like to have a boolean_up
or_healthy
metric for each pipeline (as a label). This is an issue for a PR I'll create for this.Proposed mechanism
Logstash does not directly produce such a metric, however it does emit
pipelines.[pipeline_id].reloads.last_success_timestamp
andpipelines.[pipeline_id].reloads.last_failure_timestamp
.null
, the pipeline is considered working.last_failure_timestamp
has a value, butlast_success_timestamp
isnull
, the pipeline is broken, and has been since the service started.last_failure_timestamp > last_success_timestamp
, the pipeline is broken.last_success_timestamp > last_failure_timestamp
, the pipeline is now working.In my testing, of hot-reloading with a simple invalid pipeline syntax:
output: "**** INTENTIONAL BROKEN CONFIGMAP ****"
, the above works as described on Logstash 8.4.0. If there are considerations or scenarios this doesn't work for, please advise.Proposed name
logstash_stats_pipeline_up{pipeline_id="$pipeline"}
, following the existing nomenclature oflogstash_info_up
. Any better suggestion is welcome.Version Compatibility
These metrics were introduced in Logstash 5.0.0: elastic/logstash#5848
Screenshots of the above scenarios:
The text was updated successfully, but these errors were encountered: