Skip to content

Commit

Permalink
Add metrics exposing extended md RAID info (prometheus#958)
Browse files Browse the repository at this point in the history
Add metrics that expose more information about MD RAID devices and
disks:

- the RAID level in use
- the RAID set that a disk belongs to

This allows for things like alert on unusually high I/O
utilisation for a disk compared to other disks in the same RAID set,
which usually means the disk is failing, and for comparing
write/read latency across RAID sets.

Output looks like:

    node_md_disk_info{disk_device="/dev/dm-0", md_device="md1", md_set="A"} 1
    node_md_disk_info{disk_device="/dev/dm-3", md_device="md1", md_set="B"} 1
    node_md_disk_info{disk_device="/dev/dm-2", md_device="md1", md_set="A"} 1
    node_md_disk_info{disk_device="/dev/dm-1", md_device="md1", md_set="B"} 1
    node_md_disk_info{disk_device="/dev/dm-4", md_device="md1", md_set="A"} 1
    node_md_disk_info{disk_device="/dev/dm-5", md_device="md1", md_set="B"} 1
    node_md_info{md_device="md1", md_name="foo", raid_level="10", md_metadata_version="1.2"} 1

The `node_md_info` metric, which gives additional information about the
RAID array, is intentionally separate to avoid adding all of those
labels to each disk. If you need to query using the labels contained in
`node_md_info`, you can do that using PromQL:
https://www.robustperception.io/how-to-have-labels-for-machine-roles/

I looked at adding the array UUID, but there's no sysfs entry for it and
I'm not sure there's a strong use case for it.

This patch to add a sysfs entry for the UUID was apparently not
accepted:
https://www.spinics.net/lists/raid/msg40667.html

Add these metrics as a textfile script rather than adding them to the Go
'md' module as they're perhaps less commonly useful. If lots of people
find them useful, we can later rewrite this in Go.

Signed-off-by: Matt Bostock <[email protected]>
  • Loading branch information
mattbostock authored and oblitorum committed Apr 9, 2024
1 parent be2fe3d commit a569ad1
Showing 1 changed file with 56 additions and 0 deletions.
56 changes: 56 additions & 0 deletions text_collector_examples/md_info.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/usr/bin/env bash
set -eu

for MD_DEVICE in /dev/md/*; do
# Subshell to avoid eval'd variables from leaking between iterations
(
# Resolve symlink to discover device, e.g. /dev/md127
MD_DEVICE_NUM=$(readlink -f "${MD_DEVICE}")

# Remove /dev/ prefix
MD_DEVICE_NUM=${MD_DEVICE_NUM#/dev/}
MD_DEVICE=${MD_DEVICE#/dev/md/}

# Query sysfs for info about md device
SYSFS_BASE="/sys/devices/virtual/block/${MD_DEVICE_NUM}/md"
MD_LAYOUT=$(cat "${SYSFS_BASE}/layout")
MD_LEVEL=$(cat "${SYSFS_BASE}/level")
MD_METADATA_VERSION=$(cat "${SYSFS_BASE}/metadata_version")
MD_NUM_RAID_DISKS=$(cat "${SYSFS_BASE}/raid_disks")

# Remove 'raid' prefix from RAID level
MD_LEVEL=${MD_LEVEL#raid}

# Output disk metrics
for RAID_DISK in ${SYSFS_BASE}/rd[0-9]*; do
DISK=$(readlink -f "${RAID_DISK}/block")
DISK_DEVICE=$(basename "${DISK}")
RAID_DISK_DEVICE=$(basename "${RAID_DISK}")
RAID_DISK_INDEX=${RAID_DISK_DEVICE#rd}
RAID_DISK_STATE=$(cat "${RAID_DISK}/state")

DISK_SET=""
# Determine disk set using logic from mdadm: https://github.com/neilbrown/mdadm/commit/2c096ebe4b
if [[ ${RAID_DISK_STATE} == "in_sync" && ${MD_LEVEL} == 10 && $((MD_LAYOUT & ~0x1ffff)) ]]; then
NEAR_COPIES=$((MD_LAYOUT & 0xff))
FAR_COPIES=$(((MD_LAYOUT >> 8) & 0xff))
COPIES=$((NEAR_COPIES * FAR_COPIES))

if [[ $((MD_NUM_RAID_DISKS % COPIES == 0)) && $((COPIES <= 26)) ]]; then
DISK_SET=$((RAID_DISK_INDEX % COPIES))
fi
fi

echo -n "node_md_disk_info{disk_device=\"${DISK_DEVICE}\", md_device=\"${MD_DEVICE_NUM}\""
if [[ -n ${DISK_SET} ]]; then
SET_LETTERS=({A..Z})
echo -n ", md_set=\"${SET_LETTERS[${DISK_SET}]}\""
fi
echo "} 1"
done

# Output RAID array metrics
# NOTE: Metadata version is a label rather than a separate metric because the version can be a string
echo "node_md_info{md_device=\"${MD_DEVICE_NUM}\", md_name=\"${MD_DEVICE}\", raid_level=\"${MD_LEVEL}\", md_metadata_version=\"${MD_METADATA_VERSION}\"} 1"
)
done

0 comments on commit a569ad1

Please sign in to comment.