Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enableMetrics configuration #1380

Merged
merged 1 commit into from
Sep 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions charts/aws-ebs-csi-driver/templates/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ spec:
{{- with .Values.controller.k8sTagClusterId }}
- --k8s-tag-cluster-id={{ . }}
{{- end }}
{{- if and (.Values.controller.enableMetrics) (not .Values.controller.httpEndpoint) }}
- --http-endpoint=0.0.0.0:3301
{{- end}}
{{- with .Values.controller.httpEndpoint }}
- --http-endpoint={{ . }}
{{- end }}
Expand Down Expand Up @@ -133,6 +136,11 @@ spec:
- name: healthz
containerPort: 9808
protocol: TCP
{{- if .Values.controller.enableMetrics }}
- name: metrics
containerPort: 3301
protocol: TCP
{{- end}}
livenessProbe:
httpGet:
path: /healthz
Expand Down
40 changes: 40 additions & 0 deletions charts/aws-ebs-csi-driver/templates/metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{{- if .Values.controller.enableMetrics -}}
---
apiVersion: v1
kind: Service
metadata:
name: ebs-csi-controller
namespace: kube-system
labels:
app: ebs-csi-controller
spec:
selector:
app: ebs-csi-controller
ports:
- name: metrics
port: 3301
targetPort: 3301
type: ClusterIP
---
{{- if (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1") -}}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ebs-csi-controller
namespace: kube-system
labels:
app: ebs-csi-controller
release: prometheus
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if these labels were configurable / overridable without needing to use helm's kustomize post-renderers support.

In my setup release label should have a different value. Maybe app label can remain fixed, but release: prometheus one should ideally be part of e.g. controller.serviceMonitor.additionalLabels, to allow for overrides. Then also enable switch can be moved under controller.serviceMonitor.enabled

WDYT @torredil ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevo-f3 Thanks for the feedback, included this improvement in #1419.

spec:
selector:
matchLabels:
app: ebs-csi-controller
namespaceSelector:
matchNames:
- kube-system
endpoints:
- targetPort: 3301
path: /metrics
interval: 15s
{{- end }}
{{- end }}
8 changes: 8 additions & 0 deletions charts/aws-ebs-csi-driver/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,14 @@ controller:
# key2: value2
extraVolumeTags: {}
httpEndpoint:
# (deprecated) The TCP network address where the prometheus metrics endpoint
# will run (example: `:8080` which corresponds to port 8080 on local host).
# The default is empty string, which means metrics endpoint is disabled.
# ---
enableMetrics: false
# If set to true, AWS API call metrics will be exported to the following
# TCP endpoint: "0.0.0.0:3301"
# ---
# ID of the Kubernetes cluster used for tagging provisioned EBS volumes (optional).
k8sTagClusterId:
logLevel: 2
Expand Down
81 changes: 81 additions & 0 deletions docs/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Driver Metrics

## Prerequisites

1. Install [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) in your cluster:
```sh
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prometheus prometheus-community/kube-prometheus-stack
```
2. Enable metrics by setting `enableMetrics: true` in [values.yaml](https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml).

3. Deploy EBS CSI Driver:
```sh
$ helm upgrade --install aws-ebs-csi-driver --namespace kube-system ./charts/aws-ebs-csi-driver --values ./charts/aws-ebs-csi-driver/values.yaml
```

## Overview

Installing the Prometheus Operator and enabling metrics will deploy a [Service](https://kubernetes.io/docs/concepts/services-networking/service/) object that exposes the EBS CSI Driver's controller metric port through a `ClusterIP`. Additionally, a [ServiceMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#:~:text=Alertmanager-,ServiceMonitor,-See%20the%20Alerting) object is deployed which updates the Prometheus scrape configuration and allows scraping metrics from the endpoint defined. For more information, see the manifest [metrics.yaml](/charts/aws-ebs-csi-driver/templates/metrics.yaml)

## AWS API Metrics

The EBS CSI Driver will emit [AWS API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/OperationList-query.html) metrics to the following TCP endpoint: `0.0.0.0:3301/metrics` if `enableMetrics: true` has been configured in the Helm chart.

The metrics will appear in the following format:
```sh
# HELP cloudprovider_aws_api_request_duration_seconds [ALPHA] Latency of AWS API calls
# TYPE cloudprovider_aws_api_request_duration_seconds histogram
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.005"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.01"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.025"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.05"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.1"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.25"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.5"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="1"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="2.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="10"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="+Inf"} 1
cloudprovider_aws_api_request_duration_seconds_sum{request="AttachVolume"} 0.547694574
cloudprovider_aws_api_request_duration_seconds_count{request="AttachVolume"} 1
...
```

To manually scrape AWS metrics:
```sh
$ export ebs_csi_controller=$(kubectl get lease -n kube-system ebs-csi-aws-com -o=jsonpath="{.spec.holderIdentity}")
$ kubectl port-forward $ebs_csi_controller 3301:3301 -n kube-system
$ curl 127.0.0.1:3301/metrics
```

## Volume Stats Metrics

The EBS CSI Driver emits Kubelet mounted volume metrics for volumes created with the driver.

The following metrics are currently supported:

| Metric name | Metric type | Description | Labels |
|-------------|-------------|-------------|-------------|
|kubelet_volume_stats_capacity_bytes|Gauge|The capacity in bytes of the volume|namespace=\<persistentvolumeclaim-namespace\> <br/> persistentvolumeclaim=\<persistentvolumeclaim-name\>|
|kubelet_volume_stats_available_bytes|Gauge|The number of available bytes in the volume|namespace=\<persistentvolumeclaim-namespace\> <br/> persistentvolumeclaim=\<persistentvolumeclaim-name\>|
|kubelet_volume_stats_used_bytes|Gauge|The number of used bytes in the volume|namespace=\<persistentvolumeclaim-namespace\> <br/> persistentvolumeclaim=\<persistentvolumeclaim-name\>|
|kubelet_volume_stats_inodes|Gauge|The maximum number of inodes in the volume|namespace=\<persistentvolumeclaim-namespace\> <br/> persistentvolumeclaim=\<persistentvolumeclaim-name\>|
|kubelet_volume_stats_inodes_free|Gauge|The number of free inodes in the volume|namespace=\<persistentvolumeclaim-namespace\> <br/> persistentvolumeclaim=\<persistentvolumeclaim-name\>|
|kubelet_volume_stats_inodes_used|Gauge|The number of used inodes in the volume|namespace=\<persistentvolumeclaim-namespace\> <br/> persistentvolumeclaim=\<persistentvolumeclaim-name\>|

For more information about the supported metrics, see `VolumeUsage` within the CSI spec documentation for the [NodeGetVolumeStats](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetvolumestats) RPC call.

For more information about metrics in Kubernetes, see the [Metrics For Kubernetes System Components](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#metrics-in-kubernetes) documentation.

## CSI Operations Metrics

The `csi_operations_seconds metrics` reports a latency histogram of kubelet-initiated CSI gRPC calls by gRPC status code.

To manually scrape Kubelet metrics:
```sh
$ kubectl proxy
$ kubectl get --raw /api/v1/nodes/<insert_node_name>/proxy/metrics
```