eventuate-tool's dropwizard-healthchecks provides health monitoring facilities for Eventuate components based on dropwizard's healthchecks library. The following can be monitored:
- health of the replication from remote source logs based on Available/Unavailable messages.
- health of the connection to the storage backend for persisting new events based on information from an event-log's circuit breaker.
- health of Eventuate's actors based on akka's death watch
The following artifact is published to jfrog's snapshot and release repository:
- Artifact Id:
dropwizard-healthchecks_<scala-version>
- Group Name:
com.rbmhtechnology.eventuate-tools
Settings for an sbt-build:
libraryDependencies += "com.rbmhtechnology.eventuate-tools" %% "dropwizard-healthchecks" % "<version>"
// for snapshots
resolvers += "OJO Snapshots" at "https://oss.jfrog.org/oss-snapshot-local"
// for releases
resolvers += "OJO Releases" at "https://oss.jfrog.org/oss-release-local"
Given a ReplicationEndpoint
(endpoint
) to be monitored and a HealthCheckRegistry
(healthRegistry
)
health checks can be registered under a given optional prefix (namePrefix
) for each of the components listed above as follows:
- replication health:
val monitor = new ReplicationHealthMonitor(endpoint, healthRegistry, namePrefix)
- circuit breaker health:
val monitor = new CircuitBreakerHealthMonitor(endpoint, healthRegistry, namePrefix)
- actor health:
val monitor = new ActorHealthMonitor(endpoint, healthRegistry, namePrefix)
There is also a convenience class to register all at once in a single HealthCheckRegistry
under a single namePrefix
:
val monitor = new ReplicationEndpointHealthMonitor(endpoint, healthRegistry, namePrefix)
Health monitoring can be stopped to remove the registered health checks in each case as follows:
monitor.stopMonitoring()
When the actor system stops without the monitoring being stopped first all registered health checks turn unhealthy and indicate the monitored component in an unknown state. This ensures that in case of an unexpected actor system stop (as for example triggered by Eventuate's cassandra extension, when the database cannot be accessed at startup) all components are reported as unhealthy.
For a given prefix the individual monitors register the following health checks:
ReplicationHealthMonitor
registers for each local log that is replicated to a remote endpoint:This turns unhealthy as soon as an<prefix>.replication-from.<remote-endpoint-id>.<log-name>
Unavailable
message for this particular log arrives and back to healthy when a correspondingAvailable
message arrives. See also the corresponding section in the Eventuate documentation.CircuitBreakerHealthMonitor
registers for each local log:This turns unhealthy as soon as the circuit breaker opens and healthy when it closes. Currently only the<prefix>.circuit-breaker-of.<log-id>
CassandraEventLog
uses the circuit breaker and it only applies when persisting of locally emitted events fails.ActorHealthMonitor
registers for each local log:and for the acceptor:<prefix>.actor.eventlog.<log-id>
These turn unhealthy as soon as the corresponding actors terminate.<prefix>.actor.acceptor