A metric name is built with: [_GRAPHITE_PRE.][_GRAPHITE_GROUP.]host_name.[data-source.]service_description.perfdata_name[._GRAPHITE_POST]
_GRAPHITE_PRE is used as the root hierarchy for the metrics: - same value for all hosts - proposition: client_name (all the metrics are issued from the client_name infrastructure)
_GRAPHITE_GROUP is used as a sub-root hierarchy for the metrics - value depending upon host type - proposition: windows, for windows hosts linux-snmp, for linux SNMP monitored devices linux-nrpe, for linux NRPE monitored devices switch, for switches and routers san-switch, for SAN switches hvb, for UVB hosts
data-source: - same value for all hosts - proposition: shinken (all the metrics are issued from Shinken)
_GRAPHITE_POST is used as a postfix for the metric name - not used!
Http.size
Http.time
Https.size
Https.time
client_name.srv-win-001.ADReplications.Number_of_Failed_Replicas
client_name.san-switch.san-switch-001.san_switch_sensors._FAN_.3
client_name.san-switch.san-switch-001.san_switch_sensors._FAN_.2
client_name.san-switch.san-switch-001.san_switch_sensors._FAN_.1
client_name.san-switch.san-switch-001.san_switch_sensors.SLOT__0__TEMP_.1
client_name.san-switch.san-switch-001.san_switch_sensors.SLOT__0__TEMP_.3
client_name.san-switch.san-switch-001.san_switch_sensors.SLOT__0__TEMP_.2
client_name.san-switch.san-switch-001.san_switch_sensors._Power_Supply_.1
; Host check
client_name.windows.srv-win-01.__HOST__.rta
client_name.windows.srv-win-01.__HOST__.pl
; Uptime
client_name.windows.srv-win-01.Reboot.Uptime_Minutes
; Memory
client_name.windows.srv-win-01.Memory.Physical_Memory_Utilisation
client_name.windows.srv-win-01.Memory.Physical_Memory_Used
; Network
; For each network connection name (Net_connexion_name)
client_name.windows.srv-win-01.Network_Interface.Net_connexion_name_PacketsReceivedPersec
client_name.windows.srv-win-01.Network_Interface.Net_connexion_name_OutputQueueLength
client_name.windows.srv-win-01.Network_Interface.Net_connexion_name_Receive_Utilisation
client_name.windows.srv-win-01.Network_Interface.Net_connexion_name_PacketsSentPersec
client_name.windows.srv-win-01.Network_Interface.Net_connexion_name_Send_Utilisation
client_name.windows.srv-win-01.Network_Interface.Net_connexion_name_BytesSentPersec
client_name.windows.srv-win-01.Network_Interface.Net_connexion_name_PacketsReceivedErrors
client_name.windows.srv-win-01.Network_Interface.Net_connexion_name_BytesReceivedPersec
; Services
client_name.windows.srv-win-01.Services.Total_Service_Count
client_name.windows.srv-win-01.Services.Excluded_Service_Count
client_name.windows.srv-win-01.Services.Service_Count_Problem_State
client_name.windows.srv-win-01.Services.Service_Count_OK_State
; Load
client_name.windows.srv-win-01.LoadAverage.Avg_CPU_Queue_Length
; Cpu
client_name.windows.srv-win-01.Cpu.Avg_CPU_Utilisation
; Each Cpu
; Overall ...
client_name.windows.srv-win-01.EachCpu.Avg_Utilisation_CPU_Total
; For each CPU ...
client_name.windows.srv-win-01.EachCpu.Avg_Utilisation_CPU0
client_name.windows.srv-win-01.EachCpu.Avg_Utilisation_CPU1
client_name.windows.srv-win-01.EachCpu.Avg_Utilisation_CPU2
client_name.windows.srv-win-01.EachCpu.Avg_Utilisation_CPU3
; Disk
; Overall ...
client_name.windows.srv-win-01.Disks.Overall_Disk_Space
client_name.windows.srv-win-01.Disks.Overall_Disk_Utilisation
; For each disk ...
client_name.windows.srv-win-01.Disks.C__Utilisation
client_name.windows.srv-win-01.Disks.C__Space
; Disks I/O
client_name.windows.srv-win-01.DisksIO.CurrentDiskQueueLengthHarddiskVolume1 0 1446602424
client_name.windows.srv-win-01.DisksIO.CurrentDiskQueueLengthC_ 0 1446602424
client_name.windows.srv-win-01.DisksIO.CurrentDiskQueueLength_Total 0 1446602424
client_name.windows.srv-win-01.DisksIO._AvgDiskQueueLengthHarddiskVolume1
client_name.windows.srv-win-01.DisksIO._AvgDiskReadQueueLengthHarddiskVolume1
client_name.windows.srv-win-01.DisksIO._AvgDiskWriteQueueLengthHarddiskVolume1 0 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskReadTimeHarddiskVolume1
client_name.windows.srv-win-01.DisksIO._PercentIdleTimeHarddiskVolume1
client_name.windows.srv-win-01.DisksIO._PercentBusyTimeHarddiskVolume1 0 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskTimeHarddiskVolume1 0 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskWriteTimeHarddiskVolume1 0 1446602424
client_name.windows.srv-win-01.DisksIO._DiskWritesPersecHarddiskVolume1 0 1446602424
client_name.windows.srv-win-01.DisksIO._DiskReadsPersecHarddiskVolume1 0 1446602424
client_name.windows.srv-win-01.DisksIO._DiskReadBytesPersecHarddiskVolume1 0 1446602424
client_name.windows.srv-win-01.DisksIO._DiskWriteBytesPersecHarddiskVolume1
client_name.windows.srv-win-01.DisksIO._AvgDiskReadQueueLength_Total 0 1446602424
client_name.windows.srv-win-01.DisksIO._AvgDiskWriteQueueLength_Total 0 1446602424
client_name.windows.srv-win-01.DisksIO._AvgDiskReadQueueLengthC_ 0 1446602424
client_name.windows.srv-win-01.DisksIO._AvgDiskWriteQueueLengthC_ 0 1446602424
client_name.windows.srv-win-01.DisksIO._DiskReadsPersec_Total 1 1446602424
client_name.windows.srv-win-01.DisksIO._DiskWritesPersec_Total 3 1446602424
client_name.windows.srv-win-01.DisksIO._DiskReadsPersecC_ 1 1446602424
client_name.windows.srv-win-01.DisksIO._DiskWriteBytesPersecC_ 60294 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskTimeC_ 3 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskTime_Total 1 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskReadTimeC_ 3 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskReadTime_Total 1 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskWriteTimeC_ 0 1446602424
client_name.windows.srv-win-01.DisksIO._PercentDiskWriteTime_Total 0 1446602424
client_name.windows.srv-win-01.DisksIO._PercentIdleTimeC_ 99 1446602424
client_name.windows.srv-win-01.DisksIO._PercentIdleTime_Total 100 1446602424
client_name.windows.srv-win-01.DisksIO._PercentBusyTimeC_ 1 1446602424
client_name.windows.srv-win-01.DisksIO._PercentBusyTime_Total 0 1446602424
client_name.windows.srv-win-01.DisksIO._DiskReadBytesPersec_Total 4254 1446602424
client_name.windows.srv-win-01.DisksIO._DiskWritesPersecC_ 3 1446602424
client_name.windows.srv-win-01.DisksIO._AvgDiskQueueLength_Total 0 1446602424
client_name.windows.srv-win-01.DisksIO._AvgDiskQueueLengthC_ 0 1446602424
client_name.windows.srv-win-01.DisksIO._DiskWriteBytesPersec_Total 60294 1446602424
client_name.windows.srv-win-01.DisksIO._DiskReadBytesPersecC_ 4254 1446602424
; Host check
client_name.linux-snmp.srv-snmp-001.__HOST__.rta
client_name.linux-snmp.srv-snmp-001.__HOST__.pl
; Memory
client_name.linux-snmp.srv-snmp-001.Memory.ram_free
client_name.linux-snmp.srv-snmp-001.Memory.ram_buffered
client_name.linux-snmp.srv-snmp-001.Memory.ram_cached
client_name.linux-snmp.srv-snmp-001.Memory.ram_total
; Network
; For each interface
client_name.linux-snmp.srv-snmp-001.NetworkUsage.eth0_in_error
client_name.linux-snmp.srv-snmp-001.NetworkUsage.eth0_out_error
client_name.linux-snmp.srv-snmp-001.NetworkUsage.eth0_in_discard
client_name.linux-snmp.srv-snmp-001.NetworkUsage.eth0_out_prct
client_name.linux-snmp.srv-snmp-001.NetworkUsage.eth0_out_discard
client_name.linux-snmp.srv-snmp-001.NetworkUsage.eth0_in_prct
; Load
client_name.linux-snmp.srv-snmp-001.Load.load_5_min
client_name.linux-snmp.srv-snmp-001.Load.load_15_min
client_name.linux-snmp.srv-snmp-001.Load.load_1_min
; Cpu
client_name.linux-snmp.srv-snmp-001.Cpu.cpu_prct_used
; Disks
client_name.linux-snmp.srv-snmp-001.Disks._
client_name.linux-snmp.srv-snmp-001.Disks._dev
client_name.linux-snmp.srv-snmp-001.Disks._boot
; NTP time syn
client_name.linux-snmp.srv-snmp-001.TimeSync.offset
; Host check
client_name.linux-nrpe.srv-nrpe-001.__HOST__.rta 1.089 1446602431
client_name.linux-nrpe.srv-nrpe-001.__HOST__.pl 0 1446602431
; Memory
client_name.linux-nrpe.srv-nrpe-001.nrpe-Memory.ram_free
client_name.linux-nrpe.srv-nrpe-001.nrpe-Memory.ram_buffered
client_name.linux-nrpe.srv-nrpe-001.nrpe-Memory.ram_cached
client_name.linux-nrpe.srv-nrpe-001.nrpe-Memory.ram_total
; Network
; For each interface
client_name.linux-nrpe.srv-nrpe-001.NetworkUsage.eth0_in_error
client_name.linux-nrpe.srv-nrpe-001.NetworkUsage.eth0_out_error
client_name.linux-nrpe.srv-nrpe-001.NetworkUsage.eth0_in_discard
client_name.linux-nrpe.srv-nrpe-001.NetworkUsage.eth0_out_prct
client_name.linux-nrpe.srv-nrpe-001.NetworkUsage.eth0_out_discard
client_name.linux-nrpe.srv-nrpe-001.NetworkUsage.eth0_in_prct
; Load
client_name.linux-nrpe.srv-nrpe-001.nrpe-Load.load1 0.09 1446602546
client_name.linux-nrpe.srv-nrpe-001.nrpe-Load.load15 0.07 1446602546
client_name.linux-nrpe.srv-nrpe-001.nrpe-Load.load5 0.1 1446602546
; Users
client_name.linux-nrpe.srv-nrpe-001.nrpe-Users.users 0 1446602679
; Cpu
client_name.linux-nrpe.srv-nrpe-001.nrpe-Cpu.cpu_prct_used
; Disks
client_name.linux-nrpe.srv-nrpe-001.nrpe-Disk-hda1._
The Graphite broker module is listening to all the hosts and services checks to fetch performance data from the checks results and send the performance data as metrics to Graphite.
It exists two versions of the Graphite broker module:
- the first version
graphite
is quite an old version still compatible with Shinken 1.4. - the second version, no more compatible with Shinken 1.4, and fully tested with the new Web UI
This recent version has rich features, such as:
-
do not manage metrics until initial hosts/services status are received This to avoid feeding Graphite with metrics without their prefixes/postfixes which are defined in hosts and services configuration.
-
maintain a cache for the packets not sent because of connection problems When connection is restored, the cached packets are sent to Carbon/Graphite. Cache size and cached packets burst volume are configurable.
-
metrics threshold filtering Allows to define if warning/critical, min/max metrics thresholds are to be sent to Carbon/Graphite. This to allow storing 4 extra metrics for each performance data. Theresholds can be defined as fixed lines in the graphs and it is not necessary to send those fixed values to Carbon/Graphite.
-
metrics filtering Allows to define for a service_description which performance data will not be sent to Carbon/Graphite As an example, for a
Load
service, you may filter1m
and15m
and send only the5m
performance data value. -
define a specific data source for Shinken If you have several applications feeding Graphite, you can define a specific data source for Shinken sent metrics.
-
manage hosts and services custom variables Host variable _GRAPHITE_PRE is used as the root hierarchy for the metric name Host variable _GRAPHITE_GROUP is used as a sub-root hierarchy for the metric name Service variable _GRAPHITE_POST is used as a postfix for the metric name A metric name is built with: [_GRAPHITE_PRE.][_GRAPHITE_GROUP.]host_name.[data-source.]service_description.perfdata_name[._GRAPHITE_POST]
The module configuration is heavily documented to use the configuration parameters.
# Installing forked mod-graphite
su - shinken
shinken install graphite2
# Declare broker module
vi /etc/shinken/brokers/broker-master.cfg
=> modules webui2, graphite2
# Configure module
vi /etc/shinken/modules/graphite2.cfg
=>
# Set Graphite host address
host graphite
=>
# Set Shinken data source
graphite_data_source shinken
=> Specify filters:
# This configuration allow to filter all the metrics of all the default defined services
# The only metrics sent to Carbon/Graphite are hosts checks metrics (ping rta and pl)
# Http / Https
filter Http:
filter Https:
# Windows WMI
filter Reboot:
filter Memory:
filter Services:
filter Network Interface:
filter LoadAverage:
filter Cpu:
filter EachCpu:
filter Disks:
filter DisksIO:
filter InactiveSessions:
filter BigProcesses:
# Linux SNMP
filter Memory:
filter Load:
filter NetworkUsage:
filter Cpu:
filter Disks:
filter TimeSync:
# Linux NRPE
filter nrpe-Memory:
filter nrpe-NetworkUsage:
filter nrpe-Load:
filter nrpe-Cpu:
filter nrpe-Disk-hda1:
filter nrpe-Users:
filter nrpe-Procs:
filter nrpe-Zombies:
# Switches
filter Network Interface:
When using the Hosted Graphite service, the usual configuration is to be adapted.
The Graphite server address is the endpoint address of the Hosted Graphite account. The hosts must be prefixed with the Hosted Graphite API key.
# Configure module
vi /etc/shinken/modules/graphite2.cfg
=>
# Set hostedgraphite.com endpoint adress as Graphite host
host 01234567.carbon.hostedgraphite.com
We must used the _GRAPHITE_PRE
custom variables to declare the Hosted Graphite API Key.
As we are interested by the metrics of all the monitored hosts, the best solution is to set the _GRAPHITE_PRE
in the generic-host
template ... all the hosts will inherit this prefix.
# Configure the hosts to be considered
vi /etc/shinken/templates/generic-host.cfg
=>
define host{
name generic-host
# Use Graphite prefix feature for Hosted Graphite API key
_GRAPHITE_PRE a607e7fe-d2e8-40bd-9cf4-24652f4b1f9d
The Web UI Graphite module is using the Graphite graphs rendering API to display graphs in the Shinken Web UI.
It exists two versions of the Graphite broker module:
- the first version
ui-graphite
is quite an old version still compatible with Shinken 1.4. - the second version, no more compatible with Shinken 1.4, and fully tested with the new Web UI
This recent version has rich features, such as:
-
define graph and font size for dashboard and element page graphs The space allowed for the graphs is limited in the Dashboard view. As of it it is possible to define the size of the graphs and of the font used in the graphs.
-
metrics threshold filtering Allows to define if warning/critical, min/max thresholds are present in the graphs. And if so, which color is to be used for each one of them.
-
metrics filtering Allows to define for a service_description which performance data will not be sent to Carbon/Graphite As an example, for a
Load
service, you may filter1m
and15m
and send only the5m
performance data value. -
define the graphs line mode
connected
(default) for a line connecting the pointsslope
staircase
Test the modes to see which one looks better to your metrics ... -
define the graphs Time Zone Defaults to
Europe/Paris
to request to Graphite for this time zone in the provided graphs.
The module configuration is heavily documented to use the configuration parameters.
su - shinken
# Installing official mod-ui-graphite (to get the templates ...)
shinken install ui-graphite2
# Declare Web UI module
vi /etc/shinken/modules/webui2.cfg
=> modules ui-graphite2
# Configure module
vi /etc/shinken/modules/ui-graphite2.cfg
=>
# Set Graphite host address
uri http://graphite/
=>
# Set Shinken data source
graphite_data_source shinken