Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Alertmanager MVP #3252

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open

Local Alertmanager MVP #3252

wants to merge 7 commits into from

Conversation

elipe17
Copy link

@elipe17 elipe17 commented Oct 30, 2024

Summary of Changes

  • Added MVP implementation of Alertmanager
  • Integrated Alertmanager with SendGrid for email alerts
  • Added initial prometheus alerting rules based on available metrics
  • Added some stub work for when this is deployed
  • Updated the Logs dashboard to also display uptime/availability
  • Updated frontend to allow proxy pass to Alertmanger UI
    Pull request closes Local Alertmanager #3242

How to Test

  • Before you up everything. You need to add your email (instead of mine) and a SENDGRID_API_KEY to alertmanager.yml.
    • On line 6 in alertmanager.yml replace {{ sendgrid_api_key }} with a valid api key.
    • On line 104 in alertmanager.yml replace my emails with your email(s).
cd tdrs-backend && docker-compose up --build
  1. Open the alertmanager container logs and verify no new alerts are firing (note you could see a message like: level=debug component=dispatcher msg="Received alert" alert="Local Backend Down[7649f89][active]" initially and that it is immediately resolved after: level=debug component=dispatcher msg="Received alert" alert="Local Backend Down[7649f89][resolved]". This happens because of some tight timing tolerances. You will not get an email for this which is expected.
  2. Let everything run for a minute or two and verify alertmanager is NOT firing any new alerts.
  3. Kill postgres or web or both containers and watch alertmanager start firing alerts. After the alert(s) have fired for at least 1 minute you will receive emails for the alerts. You will not receive another email for the alert for another 5 minutes
  4. Restart the container(s) you killed and verify alertmanager marks the firing alert(s) as resolved.

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

  • Prometheus connects to Alertmanager
  • Prometheus sends alerts to Alertmanager
  • Alertmanager integrated with SendGrid
  • Alert emails are received from Alertmanager/SendGrid
  • Documentation updated indicating Alertmanager integration
  • README is updated, if necessary

Deliverable 2: Tested Code

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?
    • Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
    • Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Are code maintainability principles being followed?

Deliverable 4: Accessible

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

  • Does this PR provide background for why coding decisions were made?
  • If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?
  • Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any new security issues?
  • If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

  • the purpose of the research
  • methods used to conduct the research
  • who participated in the research
  • what was tested and how
  • impact of research on TDP
  • (if applicable) final design mockups produced for TDP development

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.

Project coverage is 91.51%. Comparing base (d6c1cfc) to head (fa5f15c).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...ackend/tdpservice/users/api/authorization_check.py 40.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3252      +/-   ##
===========================================
- Coverage    91.52%   91.51%   -0.02%     
===========================================
  Files          297      297              
  Lines         8415     8416       +1     
  Branches       608      608              
===========================================
  Hits          7702     7702              
- Misses         603      604       +1     
  Partials       110      110              
Flag Coverage Δ
dev-backend 91.35% <50.00%> (-0.02%) ⬇️
dev-frontend 92.66% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
tdrs-backend/tdpservice/urls.py 92.59% <100.00%> (ø)
tdrs-frontend/src/components/Header/Header.jsx 95.65% <100.00%> (ø)
tdrs-frontend/src/components/SiteMap/SiteMap.jsx 91.66% <100.00%> (ø)
tdrs-frontend/src/selectors/auth.js 97.36% <100.00%> (ø)
...ackend/tdpservice/users/api/authorization_check.py 74.13% <40.00%> (-1.31%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53d3641...fa5f15c. Read the comment docs.

Base automatically changed from 3046-plg-cloud to develop November 1, 2024 15:18
Copy link

@raftmsohani raftmsohani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and received email! Exciting 👍

@elipe17 elipe17 added QASP Review and removed raft review This issue is ready for raft review labels Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Local Alertmanager
4 participants