Argus is a new way of thinking about systems monitoring. Systems monitoring is much more than disk space, cpu load, api responsiveness. It's all about what the customer sees.
For example :-
-
Your public api times out when clients access it. Your internal monitoring system is green. Is the whole system healthy? Obviously the answer is no and remediation needs to be started immediately.
-
Your public api is responding within normal tolerances. Your internal monitoring system is red, you just lost 7 of your 21 cassandra nodes. Remediation can probably wait
-
Predictor says you will run out of capacity in 6 months Not a show stopper, start the ordering process
-
Predictor says you will run out of stuff in 3 days A show stopper unless averted. Could, in theory be passed to Reactor to create 100 new cassandra nodes in 2 days time
Runs on the target system
Forms a limited size mesh (n5) to ensure systems data is delivered to the Messenger
AMQP server
Detects failures and triggers a reaction
Predictive analyser
Reacts to triggers from either Observer and Predictor (or anywhere supplying enough information to carry out an action)