Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update Loki monitoring docs to new meta monitoring helm #13176

Merged
merged 21 commits into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
dc88752
part update
Jayclifford345 Jun 5, 2024
346033a
Merge https://github.com/grafana/loki into update-monitoring-docs
Jayclifford345 Jun 6, 2024
951cbd3
feat: Updated Helm loki monitor with new meta-monitor helm
Jayclifford345 Jun 7, 2024
9f8ff80
feat: Added link to repo
Jayclifford345 Jun 7, 2024
7b1c12f
Merge branch 'main' into update-monitoring-docs
Jayclifford345 Jun 7, 2024
1865bcd
Update docs/sources/setup/install/helm/monitor-and-alert/with-grafana…
Jayclifford345 Jun 11, 2024
0d94877
Update docs/sources/setup/install/helm/monitor-and-alert/with-grafana…
Jayclifford345 Jun 11, 2024
2e9e379
Update docs/sources/setup/install/helm/monitor-and-alert/with-grafana…
Jayclifford345 Jun 11, 2024
675cf6c
Update docs/sources/setup/install/helm/monitor-and-alert/with-grafana…
Jayclifford345 Jun 11, 2024
b889859
Update docs/sources/setup/install/helm/monitor-and-alert/with-local-m…
Jayclifford345 Jun 11, 2024
f30f4a1
Update docs/sources/setup/install/helm/monitor-and-alert/with-local-m…
Jayclifford345 Jun 11, 2024
b09c737
Update docs/sources/setup/install/helm/monitor-and-alert/with-local-m…
Jayclifford345 Jun 11, 2024
436d0b3
Update docs/sources/setup/install/helm/monitor-and-alert/with-local-m…
Jayclifford345 Jun 11, 2024
c6e5a4a
Updated Cloud desciption and fixed wording in MinIO section
Jayclifford345 Jun 11, 2024
553ba95
Merge branch 'main' into update-monitoring-docs
Jayclifford345 Jun 11, 2024
2e21355
Apply suggestions from code review
Jayclifford345 Jun 12, 2024
e68cc39
Added definition for LGTM
Jayclifford345 Jun 12, 2024
7d575a6
Merge branch 'main' into update-monitoring-docs
Jayclifford345 Jun 12, 2024
60c5489
Fixed title issue
Jayclifford345 Jun 12, 2024
f26e21c
Apply suggestions from code review
Jayclifford345 Jun 12, 2024
702d877
Apply suggestions from code review
Jayclifford345 Jun 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Configure monitoring and alerting of Loki using Grafana Cloud
title: Monitor Loki with Grafana Cloud
menuTitle: Monitor Loki with Grafana Cloud
description: Configuring monitoring and alerts for Loki using Grafana Cloud.
description: Configuring monitoring for Loki using Grafana Cloud.
aliases:
- ../../../../installation/helm/monitor-and-alert/with-grafana-cloud
weight: 200
Expand All @@ -12,89 +12,250 @@ keywords:
- grafana cloud
---

# Configure monitoring and alerting of Loki using Grafana Cloud
# Monitor Loki with Grafana Cloud

This topic will walk you through using Grafana Cloud to monitor a Loki installation that is installed with the Helm chart. This approach leverages many of the chart's _self monitoring_ features, but instead of sending logs back to Loki itself, it sends them to a Grafana Cloud Logs instance. This approach also does not require the installation of the Prometheus Operator and instead sends metrics to a Grafana Cloud Metrics instance. Using Grafana Cloud to monitor Loki has the added benefit of being able to troubleshoot problems with Loki when the Helm installed Loki is down, as the logs will still be available in the Grafana Cloud Logs instance.
This guide will walk you through using Grafana Cloud to monitor a Loki installation set up with the Helm chart. This method takes advantage of many of the chart's self-monitoring features, sending metrics, logs, and traces from the Loki deployment to Grafana Cloud. Monitoring Loki with Grafana Cloud offers the added benefit of troubleshooting Loki issues even when the Helm-installed Loki is down, as the telemetry data will remain available in the Grafana Cloud instance.
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

**Before you begin:**
These instructions are based off the [meta-monitoring-chart repository](https://github.com/grafana/meta-monitoring-chart/tree/main).
JStickler marked this conversation as resolved.
Show resolved Hide resolved

## Before you begin

- Helm 3 or above. See [Installing Helm](https://helm.sh/docs/intro/install/).
- A Grafana Cloud account and stack (including Cloud Grafana, Cloud Metrics, and Cloud Logs).
- [Grafana Kubernetes Monitoring using Agent Flow](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-k8s-agent-flow/) configured for the Kubernetes cluster.
- A running Loki deployment installed in that Kubernetes cluster via the Helm chart.

**Prequisites for Monitoring Loki:**
## Configure the meta namespace

The meta-monitoring stack will be installed in a separate namespace called `meta`. To create this namespace, run the following command:

```bash
kubectl create namespace meta
```

## Grafana Cloud Connection Credentials

You must setup the Grafana Kubernetes Integration following the instructions in [Grafana Kubernetes Monitoring using Agent Flow](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-k8s-agent-flow/) as this will install necessary components for collecting metrics about your Kubernetes cluster and sending them to Grafana Cloud. Many of the dashboards installed as a part of the Loki integration rely on these metrics.
The meta-monitoring stack sends metrics, logs, and traces to Grafana Cloud. This requires that you know your connection credentials to Grafana Cloud. To obtain connection credentials, follow the steps below:

Walking through this installation will create two Grafana Agent configurations, one for metrics and one for logs, that will add the external label `cluster: cloud`. In order for the Dashboards in the self-hosted Grafana Loki integration to work, the cluster name needs to match your Helm installation name. If you installed Loki using the command `helm install best-loki-cluster grafana/loki`, you would need to change the `cluster` value in both Grafana Agent configurations from `cloud` to `best-loki-cluster` when setting up the Grafana Kubernetes integration.
1. Create a new Cloud Access Policy in Grafana Cloud.
1. Sign into [Grafana Cloud](https://grafana.com/auth/sign-in/).
1. In the main menu, select **Administration > Users and Access > Cloud Access Policies**.
1. Click **Create access policy**.
1. Give the policy a **Name** and select the following permissions:
- Logs: Write
- Metrics: Write
- Traces: Write
1. Click **Create**.
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

**To set up the Loki integration in Grafana Cloud:**

1. Get valid Push credentials for your Cloud Metrics and Cloud Logs instances.
1. Create a secret in the same namespace as Loki to store your Cloud Logs credentials.
1. Once the policy is created, select the policy and click **Add token**.
1. Name the token, select an expiration date, then click **Create**.
1. Copy the token to a secure location as it will not be displayed again.

2. Next, collect the `Username / Instance ID` and `URL` for the following components in the Grafana Cloud stack:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how to find any of this in the Cloud UI based on these instructions.

- **Logs (Loki):** Select `Send Logs`, copy down: `User` and `URL`. From the *Using Grafana with Logs* section.
- **Metrics (Prometheus):** Select `Send Metrics`, copy down: `User` and `URL`. From the *Using a self-hosted Grafana instance with Grafana Cloud Metrics* section.
- **Traces (OTLP):** Select `Configure`, copy down: `Instance ID` and `Endpoint`. From the *OTLP Endpoint* section.
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

3. Finally, generate the secrets for each metric type within your K8's cluster:
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved
```bash
cat <<'EOF' | NAMESPACE=loki /bin/sh -c 'kubectl apply -n $NAMESPACE -f -'
apiVersion: v1
data:
password: <BASE64_ENCODED_CLOUD_LOGS_PASSWORD>
username: <BASE64_ENCODED_CLOUD_LOGS_USERNAME>
kind: Secret
metadata:
name: grafana-cloud-logs-credentials
type: Opaque
EOF
kubectl create secret generic logs -n meta \
--from-literal=username=<USERNAME LOGS> \
--from-literal= <ACCESS POLICY TOKEN> \
--from-literal=endpoint='https://<LOG URL>/loki/api/v1/push'

kubectl create secret generic metrics -n meta \
--from-literal=username=<USERNAME METRICS> \
--from-literal=password=<ACCESS POLICY TOKEN> \
--from-literal=endpoint='https://<METRICS URL>/api/prom/push'

kubectl create secret generic traces -n meta \
--from-literal=username=<OTLP INSTANCE ID> \
--from-literal=password=<ACCESS POLICY TOKEN> \
--from-literal=endpoint='https://<OTLP URL>/otlp'
```

1. Create a secret to store your Cloud Metrics credentials.
## Configuration and Installation

To install the meta-monitoring helm chart, a `values.yaml` file will need to be created. This file at a minimum should contain the following:
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved
* The namespace to monitor
* Enablement of cloud monitoring

This example `values.yaml` file provides the minimum configuration to monitor the `loki` namespace:

```yaml
namespacesToMonitor:
- default

cloud:
logs:
enabled: true
secret: "logs"
metrics:
enabled: true
secret: "metrics"
traces:
enabled: true
secret: "traces"
```
For further configuration options, refer to the [reference file](https://github.com/grafana/meta-monitoring-chart/blob/main/charts/meta-monitoring/values.yaml).
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

To install the meta-monitoring helm chart, run the following command:
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

```bash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install meta-monitoring grafana/meta-monitoring -n meta -f values.yaml
```
or when upgrading the configuration:
```bash
helm upgrade meta-monitoring grafana/meta-monitoring -n meta -f values.yaml
```

To verify the installation, run the following command:

```bash
kubectl get pods -n meta
```
It should return the following pods:
```bash
NAME READY STATUS RESTARTS AGE
meta-alloy-0 2/2 Running 0 23h
meta-alloy-1 2/2 Running 0 23h
meta-alloy-2 2/2 Running 0 23h
```


## Enable Loki Tracing

By default, Loki does not have tracing enabled. To enable tracing, modify the Loki configuration by editing the `values.yaml` file and adding the following configuration:

Set the `tracing.enabled` configuration to `true`:
```yaml
loki:
tracing:
enabled: true
```

Next, instrument each of the Loki components to send traces to the meta-monitoring stack. Add the following configuration to each of the Loki components:
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

```yaml
ingester:
replicas: 3
extraEnv:
- name: JAEGER_ENDPOINT
value: "http://mmc-alloy-external.default.svc.cluster.local:14268/api/traces"
# This sets the Jaeger endpoint where traces will be sent.
# The endpoint points to the mmc-alloy service in the default namespace at port 14268.

- name: JAEGER_AGENT_TAGS
value: 'cluster="prod",namespace="default"'
# This specifies additional tags to attach to each span.
# Here, the cluster is labeled as "prod" and the namespace as "default".

- name: JAEGER_SAMPLER_TYPE
value: "ratelimiting"
# This sets the sampling strategy for traces.
# "ratelimiting" means that traces will be sampled at a fixed rate.

- name: JAEGER_SAMPLER_PARAM
value: "1.0"
# This sets the parameter for the sampler.
# For ratelimiting, "1.0" typically means one trace per second.
```

Since the meta-monitoring stack is installed in the `meta` namespace, the Loki components will need to be able to communicate with the meta-monitoring stack. To do this, create a new `externalname` service in the `default` namespace that points to the `meta` namespace by running the following command:

```bash
kubectl create service externalname mmc-alloy-external --external-name meta-alloy.meta.svc.cluster.local -n default
```

Finally, upgrade the Loki installation with the new configuration:

```bash
helm upgrade --values values.yaml loki grafana/loki
```

## Import the Loki Dashboards to Grafana Cloud

The meta-monitoring stack includes a set of dashboards that can be imported into Grafana Cloud. These can be located within the [meta-monitoring repository](https://github.com/grafana/meta-monitoring-chart/tree/main/charts/meta-monitoring/src/dashboards).
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved


## Installing Rules

The meta-monitoring stack includes a set of rules that can be installed to monitor the Loki installation. These rules can be located within the [meta-monitoring repository](https://github.com/grafana/meta-monitoring-chart/). To install the rules:
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

1. Clone the repository:
```bash
git clone https://github.com/grafana/meta-monitoring-chart/
```
1. Install `mimirtool` based on the instructions located [here](https://grafana.com/docs/mimir/latest/manage/tools/mimirtool/)
1. Create a new access policy token in Grafana Cloud with the following permissions:
- Rules: Write
- Rules: Read
1. Create a token for the access policy and copy it to a secure location.
1. Install the rules:
```bash
cat <<'EOF' | NAMESPACE=loki /bin/sh -c 'kubectl apply -n $NAMESPACE -f -'
apiVersion: v1
data:
password: <BASE64_ENCODED_CLOUD_METRICS_PASSWORD>
username: <BASE64_ENCODED_CLOUD_METRICS_USERNAME>
kind: Secret
metadata:
name: grafana-cloud-metrics-credentials
type: Opaque
EOF
mimirtool rules load --address=<your_cloud_prometheus_endpoint> --id=<your_instance_id> --key=<your_cloud_access_policy_token> *.yaml
```
1. Verify that the rules have been installed:
```bash
mimirtool rules list --address=<your_cloud_prometheus_endpoint> --id=<your_instance_id> --key=<your_cloud_access_policy_token>
```
It should return a list of rules that have been installed.
```bash

1. Enable monitoring metrics and logs for the Loki installation to be sent your cloud database instances by adding the following to your Helm `values.yaml` file:

```yaml
---
monitoring:
dashboards:
enabled: false
rules:
enabled: false
selfMonitoring:
logsInstance:
clients:
- url: <CLOUD_LOGS_URL>
basicAuth:
username:
name: grafana-cloud-logs-credentials
key: username
password:
name: grafana-cloud-logs-credentials
key: password
serviceMonitor:
metricsInstance:
remoteWrite:
- url: <CLOUD_METRICS_URL>
basicAuth:
username:
name: grafana-cloud-metrics-credentials
key: username
password:
name: grafana-cloud-metrics-credentials
key: password
loki-rules:
- name: loki_rules
rules:
- record: cluster_job:loki_request_duration_seconds:99quantile
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job))
- record: cluster_job:loki_request_duration_seconds:50quantile
expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job))
- record: cluster_job:loki_request_duration_seconds:avg
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job)
- record: cluster_job:loki_request_duration_seconds_bucket:sum_rate
expr: sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job)
- record: cluster_job:loki_request_duration_seconds_sum:sum_rate
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job)
- record: cluster_job:loki_request_duration_seconds_count:sum_rate
expr: sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job)
- record: cluster_job_route:loki_request_duration_seconds:99quantile
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job, route))
- record: cluster_job_route:loki_request_duration_seconds:50quantile
expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job, route))
- record: cluster_job_route:loki_request_duration_seconds:avg
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)
- record: cluster_job_route:loki_request_duration_seconds_bucket:sum_rate
expr: sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job, route)
- record: cluster_job_route:loki_request_duration_seconds_sum:sum_rate
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route)
- record: cluster_job_route:loki_request_duration_seconds_count:sum_rate
expr: sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)
- record: cluster_namespace_job_route:loki_request_duration_seconds:99quantile
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route))
- record: cluster_namespace_job_route:loki_request_duration_seconds:50quantile
expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route))
- record: cluster_namespace_job_route:loki_request_duration_seconds:avg
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace, job, route) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, namespace, job, route)
- record: cluster_namespace_job_route:loki_request_duration_seconds_bucket:sum_rate
expr: sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route)
- record: cluster_namespace_job_route:loki_request_duration_seconds_sum:sum_rate
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace, job, route)
- record: cluster_namespace_job_route:loki_request_duration_seconds_count:sum_rate
expr: sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, namespace, job, route)
```
## Kube-state-metrics
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

Metrics about Kubernetes objects are scraped from [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics). This needs to be installed in the cluster. The `kubeStateMetrics.endpoint` entry in meta-monitoring `values.yaml` should be set to it's address (without the `/metrics` part in the URL):
Jayclifford345 marked this conversation as resolved.
Show resolved Hide resolved

```yaml
kubeStateMetrics:
# Scrape https://github.com/kubernetes/kube-state-metrics by default
enabled: true
# This endpoint is created when the helm chart from
# https://artifacthub.io/packages/helm/prometheus-community/kube-state-metrics/
# is used. Change this if kube-state-metrics is installed somewhere else.
endpoint: kube-state-metrics.kube-state-metrics.svc.cluster.local:8080
```

1. Install the self-hosted Grafana Loki integration by going to your hosted Grafana instance, selecting **Connections** from the Home menu, then search for and install the **Self-hosted Grafana Loki** integration.

1. Once the self-hosted Grafana Loki integration is installed, click the **View Dashboards** button to see the installed dashboards.
Loading
Loading