This repository contains scripts and documentation related to the integration of Ansible Tower with Nagios.
Script to be used as Nagios event handler to trigger jobs in Ansible Tower.
Lots of people use event handlers in Nagios as a way to preemptively fix problems even before alerting anyone. There are some limitations on how those handlers can/should be deployed, and what kind of actions they can execute. Using Ansible Tower to execute recovery tasks gives a lot more flexibility, and provide better integration with the established automation environment. On top of that, lots of statistics can be easily generated by using the internal Tower capabilities.
This script runs on the Nagios server, and uses tower-cli
to trigger jobs in Ansible Tower. Since those jobs are standard Ansible playbooks running from within Ansible Tower, they can easily be used as a service self-healing method, by running the playbooks which your operations or DevOps team would already use to recover the service. On top of that, since those playbooks run outside the failed host, they can be used to reboot, re-provision or even auto-scale (given your Ansible Tower has already been properly configured for those tasks).
Red Hat IT developed this script in order to reduce the burden of the operations team, by automatically fixing problems without human intervention, and speeding up the time to recover.
This is note a silver bullet, it will not solve all your problems. It is merely a tool to help you automate your event management and service recovery.
- Python 2.7
- Nagios 3.5 or higher
- Ansible Tower 3.2 or higher
Please refer to the Wiki section of this repository.
tower_handler.py has been developed by Mauricio Teixeira