Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Alert Manager Receiver and Exporter #18526

Closed
2 tasks
nicolastakashi opened this issue Feb 13, 2023 · 21 comments
Closed
2 tasks

New component: Alert Manager Receiver and Exporter #18526

nicolastakashi opened this issue Feb 13, 2023 · 21 comments
Labels
Sponsor Needed New component seeking sponsor

Comments

@nicolastakashi
Copy link
Contributor

nicolastakashi commented Feb 13, 2023

The purpose and use-cases of the new component

Would be nice if we have an Alert Manager Receiver and Exporter so that we can leverage the pipeline and processors to enrich the alert content before we send it to the final destination.

Many APM vendors enrich the Alert with meta information before delivering it to the destination channels like Slack or Pager Duty and having this ability on the otel combined with the alert manager can help O11y platforms provide useful information on their alert notifications such as.
Below you can find a few examples of processors we can apply to alert before sending it.

Graph screenshot

Is very useful having screenshots with the metric plot on the alert notification, and this can be implemented using tools like Promplot or Grafana Image Render

Check dependency alerts firing.

This could be a little bit trick, but if you are using spanprocessors we may find all the dependencies for a give service and use the alertmanager api to check if there are any firing alert for that service dependency (this could be achived using the job label)

Example configuration for the component

receivers:
  alertmanager:
    address: 0.0.0.0:9093
exporters:
  alertmanager:
    address: http://monitoring.alertmanager.svc.cluster.local:9093

Telemetry data types supported

Logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

No response

@nicolastakashi nicolastakashi added the needs triage New item requiring triage label Feb 13, 2023
@djaglowski
Copy link
Member

@nicolastakashi, I think the first step here is to clarify whether you are proposing alerts as an entirely new data type (same level as traces, metrics, logs), or if they would fit into an existing data type (most likely logs).

My opinion is that the logs data model is sufficient to represent alerts, so I'm looking at this as a proposal for a logs receiver and a logs exporter. However, it's not clear what data format you are suggesting would be sent to the receiver and sent from the exporter.

@nicolastakashi
Copy link
Contributor Author

nicolastakashi commented Feb 13, 2023

Hi @djaglowski
I was in doubt when I created the issue but after reading your comment, I understood it better.
Yeah, it makes sense logs data types are flexible enough to handle the alerting schema, especially bearing in mind this is a JSON object 😄

BTW!
I updated the issue with the proper data type.

@djaglowski
Copy link
Member

Thanks @nicolastakashi.

Based on the example config, it looks like the receiver would stand up a server and listen for alerts. Is that right? What protocol are you suggesting would be used for this? Would the tcplog or udplog receivers work for you?

@nicolastakashi
Copy link
Contributor Author

@djaglowski yeah exacly,

The receiver will be listening for alerts and it should be tcplog since we need feedback about the success or failure of receiving the alert.

@djaglowski
Copy link
Member

The existing tcplog receiver would be good starting point then. It may work for you as is or otherwise please propose specific enhancements it would need.

@nicolastakashi
Copy link
Contributor Author

Cool!
@djaglowski I'll try to prepare a POC and let you know, as soon as I have something working.

@nicolastakashi
Copy link
Contributor Author

Hi @djaglowski, I tested it locally.

  1. The tcplog will not work in that case because Prometheus needs to talk to a specific Alert Manager API.
    To be able to use tcplog we need to have a different flow, Prometheus sends to AlertManager and Alertmanaged sends to tcplog and after we have an exporter to send back to Alertmanager, in my view is not the best experience.

  2. Ideally we should have an alert manager receiver to receive the Alert Payload as a log entry and then we can flow that through the OpenTelemetry pipeline.

@djaglowski
Copy link
Member

It sounds like the components you are proposing are specific to prometheus or at least to a protocol or API that prometheus uses. I think in order for your proposal to be evaluated, you need to explain this protocol or API in detail.

@nicolastakashi
Copy link
Contributor Author

Hi @djaglowski indeed AlertManager is mostly used with Prometheus, but this is an Alert System and it could be used in many different use cases.

Alertmanager has an OpenAPI definition with its API implementation, from my understanding we only need to have a receiver that handles only the create alert endpoint as you can see here

The receiver is going to receiver will work as a tcp receiver exposing an HTTP endpoint.

Let me know what kind other information you need, and if you have any other example where I can look on I can provide the infor using specific standard.

@djaglowski djaglowski added the Sponsor Needed New component seeking sponsor label Feb 22, 2023
@djaglowski
Copy link
Member

Thanks for the link @nicolastakashi.

Based on the fact that alert manager is a prometheus repository, I think it's important to have this context, so I would suggest these components should include the prometheus name in some way.

That said, this technology and use case are a bit outside of my wheelhouse so I do not expect to sponsor the component. The best way to find a sponsor is often to attend the Collector SIG meeting to explain the value and ask if any approvers or maintainers are willing to sponsor. There's a meeting today and every Wednesday at 5PM UTC.

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Mar 1, 2023

This sounds to me like shoehorning an arbitrary data type into the collector for processing. The OpenTelemetry Collector was created to process specific data types - telemetry in form of logs, metrics and traces. You certainly can feed any type of unstructured or structured data into the collector, but is it what the contrib should be concerned with?

@andrzej-stencel
Copy link
Member

Dan has raised a good point during today's Collector SIG meeting: it's probably easier to justify an Alertmanager receiver than an exporter, as you could argue an alert is as good a source of telemetry as any other event. Exporting to Alertmanager is not as easily justifiable in my eyes, as the Alertmanager does not meet the criteria of a "telemetry backend".

I don't think we reached a definite conclusion during the meeting; I suppose if a sponsor decides to support this proposal then it's good to go.

@nicolastakashi
Copy link
Contributor Author

Hey, @astencel-sumo thanks for sharing your thoughts, and apologize for not attending the meeting. I had a setback the last time, and have no time to say it.

But regarding the AlertManager exporter I agree with you it's harder to justify.

Maybe should have an external service provided by the community that accepts otel and overt to alert the manager doing it out side the collector.

So we can receive the Alerts can enrich the alert but export than to AM should be a job out of the collector.

I'll try to ask for some opinions about the AlertManager maintainers also.

@gouthamve
Copy link
Member

So I could potentially see some usefulness here:

Alertmanager Receiver: Currently, we don't have historical persistence on the notifications sent out by Prometheus. And having an AM receiver would be useful if someone wants to export that data to Loki or Elastic for further analysis down the line.

If we had a logs2metrics connector, then we could also generate custom metrics from alerts, like how many times a particular namespace has alerted. While these metrics are available in Prometheus, they are per Prometheus and this could be a global view.

Alertmanager Exporter: Now, this is harder to justify but one usecase is if we want the Collector to generate Alertmanager payloads and trigger notifications. I am not actually sure if this is in the purview of the Collector.

@nicolastakashi
Copy link
Contributor Author

@gouthamve an amazing use case about the logs2metrics connector, regarding the exporter on the worst case scenario, we can build a service that receives the OTLP and Sends to AM.

@gouthamve
Copy link
Member

The countconnector can do it today. Thanks for the idea @kovrus!

@atoulme atoulme removed the needs triage New item requiring triage label Mar 7, 2023
@nicolastakashi
Copy link
Contributor Author

@atoulme and @djaglowski is the idea to have something more generic AM can post an alert using webhooks?

@djaglowski
Copy link
Member

@nicolastakashi, the webhook receiver was proposed independently. Would it work for AM?

@nicolastakashi
Copy link
Contributor Author

@djaglowski yeah it would also work.

Alertmanager has a feature that let push an alert to an webhook, since it's a JSON object we can put the entire JSON object as a log body

@djaglowski
Copy link
Member

@nicolastakashi, that's great to hear.

Given that this issue has not found a sponsor, and that there are concerns about whether an AM exporter is appropriate, do you think we should close this issue?

@nicolastakashi
Copy link
Contributor Author

@djaglowski yeah, I guess for now we can close this, thanks for all the support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sponsor Needed New component seeking sponsor
Projects
None yet
Development

No branches or pull requests

5 participants