Nimsoft Monitoring is a monitoring solution from CA Technologies.
If you want to monitor a service with nimsoft, you'll first have to install a robot (the agent software) on your target machine or on a proxy machine (in case of agentless monitoring). Once the connection between the robot and your master server (or nearest nimsoft hub) is established, you can deploy probes onto your robot to monitor a specific service.
Each probe that you deploy on a robot is responsible for one or more dedicated
areas like checking disk and cpu utilization (cdm
probe) or parsing logfiles
for errors (logmon
probe). Most probes will gather quality of service metrics
(e.g. publish the current disk utilization every 5 minutes) and can also be
configured to send alarms once a certain threshold is exceeded (e.g. one disk
is more than 90% full). Each probe is independet from another and will run as a
seperate process on your target machine.
The configuration of these probes are stored in flat configuration files
(<probename>.cfg
) on the same host that is running the probe. So while
nimsoft comes with tools to distribute those monitoring configurations,
the agent ultimately is completly independent from the server component.
This is important for puppet to be able to change the monitoring policy for one server locally.
The ultimate goal of this project is to free the administrator (specifically
myself) of any repetitive task that might occure when a new server is
provisioned or decomissioned. The second goal is use the available puppet
infrastructure (e.g. hiera
) to give the consumers of your monitoring
landscape greater insights and the ability to tweak certain aspects of
your monitoring configuration.
Imagine you want to run an apache webserver and you already use puppet
to make sure that the apache
package is installed and the correct
apache configuration files (e.g vhost configurations) are in place.
If you now think about monitoring your new website, you'll notice that you
need a lot of information to accuratly monitor your service that you
alreay have in puppet, like the name of the vhost instance or the port your
vhost is listening on (e.g. to monitor the website with the netconnect
probe
and to parse error logs with the logmon
probe).
So instead of seperating the provisioning process and the monitoring
configuration, you will now be able combine both, so the apache
puppet
class will automatically configure the necessary probes to setup the
monitoring and whenever you add a vhost you know that it will be monitored.
This also increases trust into the infrastructure you can stop asking yourself
if you really monitor everything.
The accomplish these goals of automatic monitoring this projects let you treat your monitoring rules as puppet resources.
Despite the described goals, experience showed that having your monitoring rules in the same place as your configuration definitions also allows to evaluate that all parts of a service are monitored and which changes in the monitoring infrastructure are necessary when you make changes to the service you want to monitor, because when you make changes to a class that references a monitor class with nimsoft resources it is harder to forget to adapt your monitoring.
(currently none)
(currently none)
The nimsoft_disk
type can be used to describe a filesystem you want to
monitor. It will modify the disk/alarm/fixed
section of your cdm
configuration file. Example:
Make sure a certain device is not monitored:
nimsoft_disk { '/dev':
ensure => absent,
}
Set explicit thresholds on another device:
nimsoft_disk { '/var':
ensure => present,
warning => 20,
critical => 10,
}
deactivate the warning threshold and make sure to raise an alarm when the device is absent:
nimsoft_disk { '/var/lib/mysql':
ensure => present,
warning => absent,
critical => '10'
missing => 'yes',
}
The nimsoft_*
types all implement an instances
method so
you can run puppet resource nimsoft_disk
on a machine with the cdm probe
installed and see a list of all relevant parameters and how puppet interprets
the current configuration file.
The nimsoft_queue
type can be used to describe a queue on your hub.
nimsoft_queue { 'HUB-alarm':
ensure => present,
active => yes,
type => attach,
subject => 'alarm',
}
With exported resources you are now able to automatically add new hubs into the infrastructure and create the necessary get and attach queues on both sides.
Run puppet resource nimsoft_queue
so see how puppet currently interprets
your hub configuration.
The nimsoft_dirscan
type can be used to describe a profile for the
dirscan
probe. It can be used to check the size of a file or a group
of files and can also be used to check the number of files in a given
directory (and optional all subdirectories).
Possible usecase: You deploy an application with puppet and this application writes a specific logfile. You now want nimsoft to trigger an alarm if this logfile exceeds a certain size limit (e.g. you expect the size to be less than 10 megabytes). You also want to trigger an alarm if the logdirectory or the logfile is absent:
nimsoft_dirscan { 'foo logfile':
ensure => present,
active => yes,
description => 'Check debug.log of application foo'
directory => '/opt/foo/log',
pattern => 'debug.log',
recurse => 'no',
direxists => 'yes',
nofiles => '1',
size => '< 10M'
}
The nimsoft_logmon_profile
type can be used to describe a profile for
the logmon
probe. The logmon probe is able to monitor a logfile, to
execute a command and check its error code, or to check a url. The
nimsoft_logmon_profile
type can currently only be used to monitor
a logfile.
Example:
nimsoft_logmon_profile { 'system log':
ensure => present,
active => yes,
file => '/var/log/messages',
mode => updates,
qos => no, # do not generate Quality of Service messages
alarm => yes, # allow creation of alarm messages
alarm_maxsev => critical,
}
Note that you are only defining the general profile here. You also
have to add watcher rules for your profile. You can do that with
an upcoming nimsoft_logmon_watcher
resource type
The nimsoft_logmon_watcher
type can be used to describe a watcher rule for
a specific logmon profile. A watcher rule describes a pattern that can appear
in a logfile and describes the message that will be sent, if such an
entry appears. A watcher rule does always belong to exactly one profile.
Example:
nimsoft_logmon_watcher { 'system log/failed root login'
ensure => present,
active => yes,
match => '/FAILED su for root by (.*)/',
message => 'Possible breakin attempt detected: ${msg}',
severity => 'warning',
}
The name of the resource must be of the for profile_name/watcher_name
.
The nimsoft_logmon_exclude
type can be used to describe a single exclude rule
for a specific logmon profile. An exclude rule defines a pattern that will be
checked against each line or block before any watcher rules are processed
Example:
nimsoft_logmon_exclude { 'system log/ignore failed su'
ensure => present,
active => yes,
match => '/FAILED su for \S+ by/',
}
The name of the resource must be of the for profile_name/exclude_name
.
The nimsoft_process
type can be used to describe a profile for the proceses
probe. Example:
nimsoft_process { 'cron':
ensure => present,
description => 'Make sure cron is running (managed by puppet)',
active => yes,
pattern => '/usr/sbin/cron',
match => nameonly,
trackpid => yes,
count => '>= 1',
alarm_on => [ 'down', 'restart' ],
}
The nimsoft_oracle_connection
can be used to describe a database connection
that can be used by the oracle
probe to monitor your oracle database
instances. Example:
nimsoft_oracle_connection { 'PROD':
ensure => present,
description => 'The productional database',
connection => 'host.example.com:1521/PROD', # or some tnsnames.ora entry
user => 'nmuser',
password => 'secret',
retry => '0',
retry_delay => '10 sec',
}
The connection string can either be a service entry that can be resolved
through the tnsnames.ora
file or an easy connect string of the form
host[:port]/service_name
.
The nimsoft_oracle_profile
type can be used to describe a monitoring
profile that is used to monitor a database instance. You cannot define custom
checkpoints at the moment so every new profile that is created through puppet
will inherit all monitoring options form your template. You can however define
custom checkpoints in the oracle
probe GUI and puppet will not destroy these.
Example:
nimsoft_oracle_profile { 'PROD':
ensure => present,
active => yes,
description => 'Billing database',
connection => 'PROD',
source => 'host.example.com',
heartbeat => '5 sec',
interval => '5 min',
}
Hint: If the connection name of your nimsoft_oracle_profile
instance matches
the name of a nimsoft_oracle_connection
resource, the connection
will be
autorequired and you do not have to define an explicit require.
The sapbasis_agentil
probe can be used to monitor SAP instances. The probe
is available through CA but has been developed by Agentil.
NOTE: The agentil probe uses a new json based configuration file format
since version 4.00. The puppet types can only handle this new format, so
if you are using an older version of the sapbasis_agentil
probe, you'll most
likely destroy your configuration file!
The custom types for handling different aspects of your sapbasis_agentil
allow
a very abstract view of the configuration file and are able to add/remove/modify
systems and landscapes and creating the necessary relationships.
The agentil_landscape
type can be used to describe a landscape (a landscape
is like a container and describes one system identifier. Each landscape
can consist of one or more systems). If you are familiar with the
sapbasis_agentil
probe interface, a landscape represents the first
hierarchy level inside the configuration GUI.
agentil_landscape { 'sapdev.example.com'
ensure => present,
sid => 'DEV'
company => 'My Company'
description => 'managed by puppet',
}
The above example will make sure that the sapdev.example.com
landscape
exists and that properties like system identifier, company, and description
have the correct value. Please note that if you set ensure => absent
,
puppet will make sure that the landscape is absent but will not automatically
remove any assigned system. So make sure you have appropiate agentil_system
resources with ensure => absent
for every assigned system, too.
The agentil_user
type can be used to describe a SAP user. The
sapbasis_agentil
probe needs a designated user to connect to your SAP
systems in order to gather the different metrics. Instead of providing valid
credentials each time you add a SAP system, you can describe one user
that is valid on every system and then simply reference this user in
each of your system definitions. You can also create multiple users if
you want to use different credentials for production and developlment boxes
for example.
Example:
agentil_user { 'SAP_PROBE':
ensure => present,
password => 'encrypted_password',
}
Note: The password encryption algorithm is not public. In order to get the encrypted password you currently have to set the password in the probe GUI manually and then check the configuration file afterwards. Once you know the encrypted reprensentation of your password, you can use puppet to make sure it stays the same.
The agentil_template
resource describes a template. A template consists of
a collection of jobs and monitors to easily choose what aspects of your SAP
system you want to monitor. There are three types of templates:
- Templates created by the probe vendor have an id between 1 and 999999 and are shipped together with the probe
- Custom templates starting with id 1000000. These are normally created with the probe UI
- System templates which are implicit and cannot be seen directly in the probe UI. A system templates inherits the monitors and jobs from the assigned vendor and custom templates and also hold system specific customizations. Each system has exactly one system template
The puppet type agentil_template
currently ignores vendor templates completly
but can be used to create custom templates and system templates. If you specify
a system template you should not set jobs
explicitly since these
are inherited from the assigned templates. But you can use the agentil_template
type to establish customizations like custom tablespace utilization thresholds.
Example:
agentil_template { 'Custom Template':
ensure => present,
system => false,
jobs => [ 4, 5, 12, 177, 3 ],
agentil_template { 'System template for System sap01':
ensure => present,
system => true,
expected_instances => [ 'PRO_sap01_00', 'PRO_sap01_01' ],
tablespace_used => {
'PSAPSR3' => '80',
'PSAPUNDO' => '98',
},
}
Again you can use puppet resource agentil_template
on a system with a
configured sapbasis_agentil
probe and see how puppet interprets your
configuration file.
This resource can be used to describe an agentil system. If you are familiar with the probe GUI, these are basically your ABAP and SAP connectors and the second hiearchy level after the landscape.
The agentil system basically tells the probe how to reach an instance and what jobs and monitors should be used to monitor the instance. To do that you can define the user that is able to login and the client to connect to. You can also assign different templates that the probe GUI merges into a system template (with puppet you have to define both the original template and the system template).
Example:
agentil_system { 'PRO_sap01':
ensure => present,
landscape => 'PRO',
sid => 'PRO',
host => 'sap01.example.com',
ip => '192.168.0.1',
stack => 'abap',
user => 'SAP_PROBE',
client => '000',
group => 'LOGON_GROUP_01',
system_template => 'System template for System sap01',
templates => [
'Custom ABAP Production',
'Custom ABAP Generic',
]
}
The landscape, the user and all templates have to be present so the puppet type is be able translate the names into the corresponding ids to create a valid configuration file. Puppet will raise an error if a name connot be found.
After you make configuration changes you have to restart the probe. So if puppet
modifies a file, it'll also have to restart the effected probe. You can use the
restart_probe.sh
script to do that (you can find it in the files
directory).
You may want to use it inside a manifest, e.g.
# Make sure the script is present on your robot
file { '/opt/nimsoft/scripts/restart.sh':
ensure => file,
owner => 'root',
group => 'root',
mode => '0755',
source => 'puppet:///modules/nimsoft/restart.sh',
}
# Define an exec resource with refreshonly
exec { 'restart_cdm_probe':
command => '/opt/nimsoft/scripts/restart.sh cdm',
refreshonly => true
}
# Trigger the exec resource if something changes
nimsoft_disk { '/dev':
ensure => absent,
notify => Exec['restart_cdm_probe']
}
This project requires the puppetlabs_spec_helper
gem (available on rubygems.org)
to run the spec tests. You can run them by executing rake spec
.
The nimsoft providers all work pretty similar:
- Read the configuration file and convert it into a tree structure. All resources can be checked very efficiently against this in-memory-representation of your configuration.
- If a resource is out of sync, the tree is modified and written back to disk.
Parsing configuration files is done by the Puppet::Util::NimsoftConfig
class. Here is simple way to parse a configuration file:
config = Puppet::Util::NimsoftConfig.add('cdm.cfg') # should be an absolute path
config.parse unless config.loaded?
The first line will either create a new Puppet::Util::NimsoftConfig
object or -
if the file was already added before - will return an already present object that
represents the configuration file. This way seperate providers can modifiy the
same configuration file and modifications of the tree structure of provider 1
can directly be seen by provider 2, thus eleminating the need to parse the
configuration file multiple times. So you can e.g. create a cdm_disk
and a
cdm_cpu
provider both managing the cdm.cfg
file at the same time.
If you want to develop a new provider for a new custom type you should
inherit from the Puppet::Provider::Nimsoft
provider
Let's take the nimsoft_disk
as a step by step example. You'll first have to
create the basic sketch of your provider:
require 'puppet/provider/nimsoft'
Puppet::Type.type(:nimsoft_cdm_disk).provide(:nimsoft, :parent => Puppet::Provider::Nimsoft) do
register_config '/opt/nimsoft/probes/system/cdm/cdm.cfg', 'disk/alarm/fixed'
end
The register_config
method is inherited from the Puppet::Provider::Nimsoft
provider and will trigger the parsing of the specified configuration file
and will take the specified section as the rootsection for your provider.
Each subsection within that new root section is processed as a new instance
of your custom type. The element title will be the name
of that instance.
At a class level you can use the classs method root
to get a
Puppet::Util::NimsoftSection
object that represents the root section you
have defined earlier and config
to get the representation the whole
configuration file.
Each provider instance can use the method element
to get the subtree that
is mapped to the specific provider instance.
You can modify the tree as you like and then run the class method
config.sync
to save your changes back to disk.
In case each section within your root
section represents a provider
instance and in case your resource properties are simple attributes within
these sections, you can use the method map_property
to save you a lot of
typing and create getter and setter methods.
E.g. for our nimsoft_disk
type every section within the root section disk/alarm/fixed
represents
one disk. The description
attribute of each subsection can be mapped to a description
property of
our custom type, so let's modify our provider:
require 'puppet/provider/nimsoft'
Puppet::Type.type(:nimsoft_cdm_disk).provide(:nimsoft, :parent => Puppet::Provider::Nimsoft) do
register_config '/opt/nimsoft/probes/system/cdm/cdm.cfg', 'disk/alarm/fixed'
map_property :description
end
If the property name is different from the attribute name, we can define a custom attribute name.
require 'puppet/provider/nimsoft'
Puppet::Type.type(:nimsoft_cdm_disk).provide(:nimsoft, :parent => Puppet::Provider::Nimsoft) do
register_config '/opt/nimsoft/probes/system/cdm/cdm.cfg', 'disk/alarm/fixed'
map_property :description
map_property :device, :attribute => :disk
end
We can also define a section within the subtree:
require 'puppet/provider/nimsoft'
Puppet::Type.type(:nimsoft_cdm_disk).provide(:nimsoft, :parent => Puppet::Provider::Nimsoft) do
register_config '/opt/nimsoft/probes/system/cdm/cdm.cfg', 'disk/alarm/fixed'
map_property :description
map_property :device, :attribute => :disk
map_property :warning, :section => 'warning', :attribute => :threshold
map_property :critical, :section => 'error', :attribute => :threshold
end
and we can also instruct the provider to symbolize the attribute value:
require 'puppet/provider/nimsoft'
Puppet::Type.type(:nimsoft_cdm_disk).provide(:nimsoft, :parent => Puppet::Provider::Nimsoft) do
register_config '/opt/nimsoft/probes/system/cdm/cdm.cfg', 'disk/alarm/fixed'
map_property :description
map_property :device, :attribute => :disk
map_property :warning, :section => 'warning', :attribute => :threshold
map_property :critical, :section => 'error', :attribute => :threshold
map_property :active, :symbolize => true
map_property :missing, :attribute => :active, :section => 'missing', :symbolize => :yes
end