-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check state for mdadm arrays via node_md_state metric #1810
Conversation
@SuperQ @discordianfish with prometheus/procfs#321 merged already this PR now makes sense / is functional. Could you maybe give it a glimpse and tell me if I need to make changes for this to be merged at some point? |
There's another proposal to enhance mdstat parsing: prometheus/procfs#329 Maybe we should review that? |
@SuperQ Do you see any reason to hold back on merging this PR in the meantime or is there something I should add to this PR as a result of the further additions of prometheus/procfs#329 then? This PR here is currently only about adding another state an md can be in and there will certainly always be more things/metrics that could be added with yet another PR. |
No, I guess this is fine without any more updates. But it looks like we need to cut a procfs release anyway to pull in the other change. |
I've cut a new procfs release, and #1852 to merge it into node_exporter. |
Would you please rebase this against head? |
Signed-off-by: Christian Rohmann <[email protected]>
…te and a the new state=check labeled metric for all other md Signed-off-by: Christian Rohmann <[email protected]>
7f75cbf
to
683e75f
Compare
Sorry for the delay @SuperQ .. just did the rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Update Build - Update CircleCI orb. - Update CIrcleCI Machine image. - Use golang-builder 1.15. * Update Go modules. * Fixup fixtures for XFS bug. Changes: * [CHANGE] Improve filter flag names. #1743 * [CHANGE] Add btrfs and powersupplyclass to list of exporters enabled by default #1897 * [FEATURE] Add fibre channel collector #1786 * [FEATURE] Expose cpu bugs and flags as info metrics. #1788 * [FEATURE] Add network_route collector #1811 * [FEATURE] Add zoneinfo collector #1922 * [ENHANCEMENT] Add more InfiniBand counters #1694 * [ENHANCEMENT] Add flag to aggr ipvs metrics to avoid high cardinality metrics #1709 * [ENHANCEMENT] Adding backlog/current queue length to qdisc collector #1732 * [ENHANCEMENT] Include TCP OutRsts in netstat metrics #1733 * [ENHANCEMENT] Add pool size to entropy collector #1753 * [ENHANCEMENT] Remove CGO dependencies for OpenBSD amd64 #1774 * [ENHANCEMENT] bcache: add writeback_rate_debug stats #1658 * [ENHANCEMENT] Add check state for mdadm arrays via node_md_state metric #1810 * [ENHANCEMENT] Expose XFS inode statistics #1870 * [ENHANCEMENT] Expose zfs zpool state #1878 * [ENHANCEMENT] Added an ability to pass collector.supervisord.url via SUPERVISORD_URL environment variable #1947 * [BUGFIX] filesystem_freebsd: Fix label values #1728 * [BUGFIX] Fix various procfs parsing errors #1735 * [BUGFIX] Handle no data from powersupplyclass #1747 * [BUGFIX] udp_queues_linux.go: s/upd/udp/ in two error strings #1769 * [BUGFIX] Fix node_scrape_collector_success behaviour #1816 * [BUGFIX] Fix NodeRAIDDegraded to not use a string rule expressions #1827 * [BUGFIX] fix: node_md_disks state label from fail to failed #1862 * [BUGFIX] Handle EPERM for syscall in timex collector #1938 * [BUGFIX] bcache: fix typo #1943 * [BUGFIX] Fix XFS read/write stats (prometheus/procfs#343) Signed-off-by: Ben Kochie <[email protected]>
* Update Build - Update CircleCI orb. - Update CIrcleCI Machine image. - Use golang-builder 1.15. * Update Go modules. * Fixup fixtures for XFS bug. NOTE: We have improved some of the flag naming conventions (PR #1743). The old names are deprecated and will be removed in 2.0. They will continue to work for backwards compatibility. * [CHANGE] Improve filter flag names #1743 * [CHANGE] Add btrfs and powersupplyclass to list of exporters enabled by default #1897 * [FEATURE] Add fibre channel collector #1786 * [FEATURE] Expose cpu bugs and flags as info metrics. #1788 * [FEATURE] Add network_route collector #1811 * [FEATURE] Add zoneinfo collector #1922 * [ENHANCEMENT] Add more InfiniBand counters #1694 * [ENHANCEMENT] Add flag to aggr ipvs metrics to avoid high cardinality metrics #1709 * [ENHANCEMENT] Adding backlog/current queue length to qdisc collector #1732 * [ENHANCEMENT] Include TCP OutRsts in netstat metrics #1733 * [ENHANCEMENT] Add pool size to entropy collector #1753 * [ENHANCEMENT] Remove CGO dependencies for OpenBSD amd64 #1774 * [ENHANCEMENT] bcache: add writeback_rate_debug stats #1658 * [ENHANCEMENT] Add check state for mdadm arrays via node_md_state metric #1810 * [ENHANCEMENT] Expose XFS inode statistics #1870 * [ENHANCEMENT] Expose zfs zpool state #1878 * [ENHANCEMENT] Added an ability to pass collector.supervisord.url via SUPERVISORD_URL environment variable #1947 * [BUGFIX] filesystem_freebsd: Fix label values #1728 * [BUGFIX] Fix various procfs parsing errors #1735 * [BUGFIX] Handle no data from powersupplyclass #1747 * [BUGFIX] udp_queues_linux.go: change upd to udp in two error strings #1769 * [BUGFIX] Fix node_scrape_collector_success behaviour #1816 * [BUGFIX] Fix NodeRAIDDegraded to not use a string rule expressions #1827 * [BUGFIX] Fix node_md_disks state label from fail to failed #1862 * [BUGFIX] Handle EPERM for syscall in timex collector #1938 * [BUGFIX] bcache: fix typo in a metric name #1943 * [BUGFIX] Fix XFS read/write stats (prometheus/procfs#343) Signed-off-by: Ben Kochie <[email protected]>
…#1810) * Expose metric for state=check for node_md_state * Added new e2e output fixture including md201 which is in checking state and a the new state=check labeled metric for all other md Signed-off-by: Christian Rohmann <[email protected]>
* Update Build - Update CircleCI orb. - Update CIrcleCI Machine image. - Use golang-builder 1.15. * Update Go modules. * Fixup fixtures for XFS bug. NOTE: We have improved some of the flag naming conventions (PR prometheus#1743). The old names are deprecated and will be removed in 2.0. They will continue to work for backwards compatibility. * [CHANGE] Improve filter flag names prometheus#1743 * [CHANGE] Add btrfs and powersupplyclass to list of exporters enabled by default prometheus#1897 * [FEATURE] Add fibre channel collector prometheus#1786 * [FEATURE] Expose cpu bugs and flags as info metrics. prometheus#1788 * [FEATURE] Add network_route collector prometheus#1811 * [FEATURE] Add zoneinfo collector prometheus#1922 * [ENHANCEMENT] Add more InfiniBand counters prometheus#1694 * [ENHANCEMENT] Add flag to aggr ipvs metrics to avoid high cardinality metrics prometheus#1709 * [ENHANCEMENT] Adding backlog/current queue length to qdisc collector prometheus#1732 * [ENHANCEMENT] Include TCP OutRsts in netstat metrics prometheus#1733 * [ENHANCEMENT] Add pool size to entropy collector prometheus#1753 * [ENHANCEMENT] Remove CGO dependencies for OpenBSD amd64 prometheus#1774 * [ENHANCEMENT] bcache: add writeback_rate_debug stats prometheus#1658 * [ENHANCEMENT] Add check state for mdadm arrays via node_md_state metric prometheus#1810 * [ENHANCEMENT] Expose XFS inode statistics prometheus#1870 * [ENHANCEMENT] Expose zfs zpool state prometheus#1878 * [ENHANCEMENT] Added an ability to pass collector.supervisord.url via SUPERVISORD_URL environment variable prometheus#1947 * [BUGFIX] filesystem_freebsd: Fix label values prometheus#1728 * [BUGFIX] Fix various procfs parsing errors prometheus#1735 * [BUGFIX] Handle no data from powersupplyclass prometheus#1747 * [BUGFIX] udp_queues_linux.go: change upd to udp in two error strings prometheus#1769 * [BUGFIX] Fix node_scrape_collector_success behaviour prometheus#1816 * [BUGFIX] Fix NodeRAIDDegraded to not use a string rule expressions prometheus#1827 * [BUGFIX] Fix node_md_disks state label from fail to failed prometheus#1862 * [BUGFIX] Handle EPERM for syscall in timex collector prometheus#1938 * [BUGFIX] bcache: fix typo in a metric name prometheus#1943 * [BUGFIX] Fix XFS read/write stats (prometheus/procfs#343) Signed-off-by: Ben Kochie <[email protected]>
…#1810) * Expose metric for state=check for node_md_state * Added new e2e output fixture including md201 which is in checking state and a the new state=check labeled metric for all other md Signed-off-by: Christian Rohmann <[email protected]>
* Update Build - Update CircleCI orb. - Update CIrcleCI Machine image. - Use golang-builder 1.15. * Update Go modules. * Fixup fixtures for XFS bug. NOTE: We have improved some of the flag naming conventions (PR prometheus#1743). The old names are deprecated and will be removed in 2.0. They will continue to work for backwards compatibility. * [CHANGE] Improve filter flag names prometheus#1743 * [CHANGE] Add btrfs and powersupplyclass to list of exporters enabled by default prometheus#1897 * [FEATURE] Add fibre channel collector prometheus#1786 * [FEATURE] Expose cpu bugs and flags as info metrics. prometheus#1788 * [FEATURE] Add network_route collector prometheus#1811 * [FEATURE] Add zoneinfo collector prometheus#1922 * [ENHANCEMENT] Add more InfiniBand counters prometheus#1694 * [ENHANCEMENT] Add flag to aggr ipvs metrics to avoid high cardinality metrics prometheus#1709 * [ENHANCEMENT] Adding backlog/current queue length to qdisc collector prometheus#1732 * [ENHANCEMENT] Include TCP OutRsts in netstat metrics prometheus#1733 * [ENHANCEMENT] Add pool size to entropy collector prometheus#1753 * [ENHANCEMENT] Remove CGO dependencies for OpenBSD amd64 prometheus#1774 * [ENHANCEMENT] bcache: add writeback_rate_debug stats prometheus#1658 * [ENHANCEMENT] Add check state for mdadm arrays via node_md_state metric prometheus#1810 * [ENHANCEMENT] Expose XFS inode statistics prometheus#1870 * [ENHANCEMENT] Expose zfs zpool state prometheus#1878 * [ENHANCEMENT] Added an ability to pass collector.supervisord.url via SUPERVISORD_URL environment variable prometheus#1947 * [BUGFIX] filesystem_freebsd: Fix label values prometheus#1728 * [BUGFIX] Fix various procfs parsing errors prometheus#1735 * [BUGFIX] Handle no data from powersupplyclass prometheus#1747 * [BUGFIX] udp_queues_linux.go: change upd to udp in two error strings prometheus#1769 * [BUGFIX] Fix node_scrape_collector_success behaviour prometheus#1816 * [BUGFIX] Fix NodeRAIDDegraded to not use a string rule expressions prometheus#1827 * [BUGFIX] Fix node_md_disks state label from fail to failed prometheus#1862 * [BUGFIX] Handle EPERM for syscall in timex collector prometheus#1938 * [BUGFIX] bcache: fix typo in a metric name prometheus#1943 * [BUGFIX] Fix XFS read/write stats (prometheus/procfs#343) Signed-off-by: Ben Kochie <[email protected]>
I added parsing of the checking state for mdstat from procfs (prometheus/procfs#321) and this PR will now expose this as
node_md_state
metric with labelstate=check
.This metric is useful to first of all see arrays which are currently in this state, but also to use as condition in i.e. in alerts on disk read rate which is certainly very high when doing a full read of all blocks.
The tests likely will fail until the PR for procfs (prometheus/procfs#321) is merged as there is no proper parsing of the md201 test case util then.
Signed-off-by: Christian Rohmann [email protected]
@SuperQ PTAL