Skip to content

Latest commit

 

History

History
66 lines (53 loc) · 1.93 KB

README.md

File metadata and controls

66 lines (53 loc) · 1.93 KB

Overview

Systemd-doctor is a health monitoring service designed to track and manage the health of various services on an embedded Linux device.

It integrates with Systemd to automatically restart services when abnormalities are detected, making sure your custom services are working.

Additionally, Systemd-doctor stores metrics in a time-series database, allowing users to view metrics and charts. It is helpful for System Analysis when we need a comprehensive data to evaluate out custom services and resouce, good information for debugging too.

Systemd-doctor service is able to reset itself by Systemd Watchdog

Features

  • Monitors CPU load, memory usage, disk space, and service status of "services"...
  • Tracks global metrics like CPU temperature, board temperature, and network bandwidth...
  • Journal-logging for each of service and kernel log
  • Automatically restarts services if thresholds are breached.
  • Validates if the services specified for tracking are valid systemd services.
  • Stores metrics in a time-series database for visualization in Grafana.

Configuration

Tracking Services Registration

The configuration file (config.toml) allows users to specify the services to monitor and their respective thresholds. Example

[services]
list = ["ota", "mqtt-client", "can-parser", "logging"]

[thresholds.ota]
cpu = 80.0
memory = 70.0
disk = 90

[thresholds.mqtt-client]
cpu = 60.0
memory = 50.0
disk = 85

[thresholds.can-parser]
cpu = 75.0
memory = 65.0
disk = 88

[thresholds.logging]
cpu = 70.0
memory = 60.0
disk = 85

[global_thresholds]
cpu_temperature = 80.0
board_temperature = 70.0
network_bandwidth = 1000.0 

Service file for Systemd-doctor

[Unit]
Description=Doctor Viet - Health Monitoring Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/systemd-doctor --config=/path/to/config.toml
WatchdogSec=10
Restart=always

[Install]
WantedBy=multi-user.target