Warning
This documentation and stacks are currently a work-in-progress.
A documentation on how to get started with Docker Swarm Monitoring
- grafana-loki: A high-availability Grafana Loki deployment for Docker Swarm.
- grafana: Docker Stack deployment for Grafana's Dashboard.
- grafana-tempo: A high-availability Grafana Tempo deployment for Docker Swarm.
- grafana-mimir: A high-availability Grafana Mimir deployment for Docker Swarm.
- promtail: Docker Stack deployment for Grafana Loki's Promtail.
- prometheus: A high-availability prometheus stack for Docker Swarm.
- alertmanager: AboutA high-availability alertmanager stack for Docker Swarm.
- promagents: Docker Stack deployment for cAdvisor & node-exporter.
- A Docker Swarm cluster with at least 3 managers and 7 workers.
- Object Storage (MinIO or Amazon S3)
- Database (PostgreSQL)
This is an example of a 10 nodes cluster with 3 managers and 7 workers and their associated labels.
$ docker node ls
# ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION Labels
# 2pan * monitoring-manager-1 Ready Active Leader 24.0.6 etcd=true, consul=true, alertmanager=true
# cxz3 monitoring-manager-2 Ready Active Reachable 24.0.6 etcd=true, consul=true, prometheus=true, alertmanager=true
# p8le monitoring-manager-3 Ready Active Reachable 24.0.6 etcd=true, consul=true, prometheus=true, alertmanager=true
# wd0l monitoring-worker-1 Ready Active 24.0.6 loki=true, mimir=true
# tex5 monitoring-worker-2 Ready Active 24.0.6 loki=true, mimir=true
# ne00 monitoring-worker-3 Ready Active 24.0.6 loki=true, mimir=true
# lfnj monitoring-worker-4 Ready Active 24.0.6 minio=true, postgres=true
# mols monitoring-worker-5 Ready Active 24.0.6 minio=true, postgres=true
# 4ljr monitoring-worker-6 Ready Active 24.0.6 minio=true, postgres=true
# pogc monitoring-worker-7 Ready Active 24.0.6 grafana=true
To add node labels, run the following command:
$ docker node update --label-add <key>=<value> <node-id>
For example, to add consul=true label to the manager-1 node:
$ docker node update --label-add consul=true monitoring-manager-1
$ docker node update --label-add consul=true monitoring-manager-2
$ docker node update --label-add consul=true monitoring-manager-3
Note
Repeat the same process for other labels. See Server placement on each stack you planned to deploy.
You might need to create swarm-scoped overlay network called dockerswarm_monitoring for all the stacks to communicate if you haven't already.
On the manager node, run the following command:
$ docker network create --scope swarm --driver overlay --attachable dockerswarm_monitoring
This will distribute promtail and promagents to all the available nodes
There is no configuration needed for promtail and promagents, they will automatically discover the nodes and start scraping the metrics and logs.
promtail:
# https://github.com/YouMightNotNeedKubernetes/promtail
$ gh repo clone YouMightNotNeedKubernetes/promtail
$ cd promtail
$ make deploy
promagents:
# https://github.com/YouMightNotNeedKubernetes/promagents
$ gh repo clone YouMightNotNeedKubernetes/promagents
$ cd promagents
$ make deploy
The prometheus and alertmanager will be deployed as shown in this diagram.
By default, it will deploy 3 replicas of Alertmanager. Having more than 3 replicas is way too much for a small cluster.
# https://github.com/YouMightNotNeedKubernetes/alertmanager
$ gh repo clone YouMightNotNeedKubernetes/alertmanager
$ cd alertmanager
$ make deploy
By default, it will deploy 2 replicas of Prometheus. Having more than 2 replicas is way too much for a small cluster.
# https://github.com/YouMightNotNeedKubernetes/prometheus
$ gh repo clone YouMightNotNeedKubernetes/prometheus
$ cd prometheus
$ make deploy
Note
The MinIO Object Storage required for grafana-mimir
and grafana-loki
and need to be deployed first. If you planned to use Amazon S3, you can skip this stack.
See https://github.com/YouMightNotNeedKubernetes/minio for how to deploy MinIO.
Note
The PostgreSQL cluster using Spilo required etcd
to be deployed first. If you planned to use alternative solution, you can skip this stack.
See https://github.com/YouMightNotNeedKubernetes/etcd for how to deploy etcd.
See https://github.com/YouMightNotNeedKubernetes/postgresql-spilo for how to deploy PostgreSQL.
Important
HashiCorp Consul is required for grafana-mimir
and grafana-loki
stack. For deploying a high-availability cluster.
# https://github.com/YouMightNotNeedKubernetes/hashicorp-consul
$ gh repo clone YouMightNotNeedKubernetes/hashicorp-consul
$ cd hashicorp-consul
$ make deploy
See https://github.com/YouMightNotNeedKubernetes/hashicorp-consul for how to deploy HashiCorp Consul.
# https://github.com/YouMightNotNeedKubernetes/grafana-loki
$ gh repo clone YouMightNotNeedKubernetes/grafana-loki
$ cd grafana-loki
$ make deploy
You will need to create bucket on MinIO or Amazon S3 for storing the logs.
loki
Note
You can change the bucket name using the configuration file.
See https://github.com/YouMightNotNeedKubernetes/grafana-loki for how to configure Grafana Loki.
# https://github.com/YouMightNotNeedKubernetes/grafana-mimir
$ gh repo clone YouMightNotNeedKubernetes/grafana-mimir
$ cd grafana-mimir
$ make deploy
You will need to create bucket on on MinIO or Amazon S3 for storing the metrics.
mimir
mimir-blocks
mimir-ruler
mimir-alertmanager
Note
You can change the bucket name using the configuration file.
See https://github.com/YouMightNotNeedKubernetes/grafana-mimir for how to configure Grafana Mimir.
This is a generic Grafana Dashboard, by default an embeded sqlite
will be used as the default database. To achive a high-availability Grafana Dashboard, you will need to use external database such as PostgreSQL or MySQL.
# https://github.com/YouMightNotNeedKubernetes/grafana
$ gh repo clone YouMightNotNeedKubernetes/grafana
$ cd grafana
$ make deploy
See https://github.com/YouMightNotNeedKubernetes/grafana for how to configure Grafana Dashboard.