Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[o11y] Add base alert rules to all CKF charms with metrics #1026

Closed
rgildein opened this issue Aug 13, 2024 · 2 comments
Closed

[o11y] Add base alert rules to all CKF charms with metrics #1026

rgildein opened this issue Aug 13, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@rgildein
Copy link

Context

Add base alert rules (from KF093 spec) to all charms with metrics implementation.

groups:
- name: <charm-name>
  rules:
  - alert: KubeflowServiceDown
    expr: up{} < 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "{{ $labels.juju_charm }} service is Down ({{ $labels.juju_model }}/{{ $labels.juju_unit }})"
      description: |
       One or more targets of {{ $labels.juju_charm }} charm are down on unit {{ $labels.juju_model }}/{{ $labels.juju_unit }}.
       LABELS = {{ $labels }}

  - alert: KubeflowServiceIsNotStable
    expr: avg_over_time(up{}[10m]) < 0.5
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.juju_charm }} service is not stable ({{ $labels.juju_model }}/{{ $labels.juju_unit }})"
      description: |
        {{ $labels.juju_charm }} unit {{ $labels.juju_model }}/{{ $labels.juju_unit }} has been unreachable at least 50% of the time over the last 10 minutes.
        LABELS = {{ $labels }}

What needs to get done

  1. Add KubeflowServiceDown and KubeflowServiceIsNotStable alert rules to charms
  2. They tested their presence with assert_alert_rules.

Definition of Done

  1. Alerts are present in each charm
  2. These changes are backported to 1.9.
@rgildein rgildein added the enhancement New feature or request label Aug 13, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6125.

This message was autogenerated

This was referenced Aug 15, 2024
@rgildein
Copy link
Author

All CKF charms were updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant