Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET. It is releasing April 2019 and works with Polly v7.0.0 onwards.
Simmy allows you to introduce a chaos-injection policy or policies at any location where you execute code through Polly.
There are a lot of questions when it comes to chaos-engineering and making sure that a system is actually ready to face the worst possible scenarios:
- Is my system resilient enough?
- Am I handling the right exceptions/scenarios?
- How will my system behave if X happens?
- How can I test without waiting for a handled (or even unhandled) exception to happen in my production environment?
Using Polly helps me introduce resilience to my project, but I don't want to have to wait for expected or unexpected failures to test it out. My resilience could be wrongly implemented; testing the scenarios is not straight forward; and mocking failure of some dependencies (for example a cloud SaaS or PaaS service) is not always straightforward.
What do I need, to simulate chaotic scenarios in my production environment?
- A way to mock failures of dependencies (any service dependency for example).
- Define when to fail based on some external factors - maybe global configuration or some rule.
- A way to revert easily, to control the blast radius.
- Production grade, to run this in a production or near-production system with automation.
Simmy offers the following chaos-injection policies:
Policy | What does the policy do? |
---|---|
Exception | Injects exceptions in your system. |
Result | Substitute results to fake faults in your system. |
Latency | Injects latency into executions before the calls are made. |
Behavior | Allows you to inject any extra behaviour, before a call is placed. |
var chaosPolicy = MonkeyPolicy.InjectException(Action<InjectOutcomeOptions<Exception>>);
For example:
// Following example causes the policy to throw SocketException with a probability of 5% if enabled
var fault = new SocketException(errorCode: 10013);
var chaosPolicy = MonkeyPolicy.InjectException(with =>
with.Fault(fault)
.InjectionRate(0.05)
.Enabled()
);
var chaosPolicy = MonkeyPolicy.InjectResult(Action<InjectOutcomeOptions<TResult>>);
For example:
// Following example causes the policy to return a bad request HttpResponseMessage with a probability of 5% if enabled
var result = new HttpResponseMessage(HttpStatusCode.BadRequest);
var chaosPolicy = MonkeyPolicy.InjectResult<HttpResponseMessage>(with =>
with.Result(result)
.InjectionRate(0.05)
.Enabled()
);
var chaosPolicy = MonkeyPolicy.InjectLatency(Action<InjectLatencyOptions>);
For example:
// Following example causes policy to introduce an added latency of 5 seconds to a randomly-selected 10% of the calls.
var isEnabled = true;
var chaosPolicy = MonkeyPolicy.InjectLatency(with =>
with.Latency(TimeSpan.FromSeconds(5))
.InjectionRate(0.1)
.Enabled(isEnabled)
);
var chaosPolicy = MonkeyPolicy.InjectBehaviour(Action<InjectBehaviourOptions>);
For example:
// Following example causes policy to execute a method to restart a virtual machine; the probability that method will be executed is 1% if enabled
var chaosPolicy = MonkeyPolicy.InjectBehaviour(with =>
with.Behaviour(() => restartRedisVM())
.InjectionRate(0.01)
.EnabledWhen((ctx, ct) => isEnabled(ctx, ct))
);
All the parameters are expressed in a Fluent-builder syntax way.
Determines whether the policy is enabled or not.
- Configure that the monkey policy is enabled.
PolicyOptions.Enabled();
- Receives a boolean value indicating whether the monkey policy is enabled.
PolicyOptions.Enabled(bool);
- Receives a delegate which can be executed to determine whether the monkey policy should be enabled.
PolicyOptions.EnabledWhen(Func<Context, CancellationToken, bool>);
A decimal between 0 and 1 inclusive. The policy will inject the fault, randomly, that proportion of the time, eg: if 0.2, twenty percent of calls will be randomly affected; if 0.01, one percent of calls; if 1, all calls.
- Receives a double value between [0, 1] indicating the rate at which this monkey policy should inject chaos.
PolicyOptions.InjectionRate(Double);
- Receives a delegate which can be executed to determine the rate at which this monkey policy should inject chaos.
PolicyOptions.InjectionRate(Func<Context, CancellationToken, Double>);
The fault to inject. The Fault
api has overloads to build the policy in a generic way: PolicyOptions.Fault<TResult>(...)
- Receives an exception to configure the fault to inject with the monkey policy.
PolicyOptions.Fault(Exception);
- Receives a delegate representing the fault to inject with the monkey policy.
PolicyOptions.Fault(Func<Context, CancellationToken, Exception>);
The result to inject.
- Receives a generic TResult value to configure the result to inject with the monkey policy.
PolicyOptions.Result<TResult>(TResult);
- Receives a delegate representing the result to inject with the monkey policy.
PolicyOptions.Result<TResult>(Func<Context, CancellationToken, TResult>);
The latency to inject.
- Receives a TimeSpan value to configure the latency to inject with the monkey policy.
PolicyOptions.Latency(TimeSpan);
- Receives a delegate representing the latency to inject with the monkey policy.
PolicyOptions.Latency(Func<Context, CancellationToken, TimeSpan>);
The behaviour to inject.
- Receives an Action to configure the behaviour to inject with the monkey policy.
PolicyOptions.Behaviour(Action);
- Receives a delegate representing the Action to inject with the monkey policy.
PolicyOptions.Behaviour(Action<Context, CancellationToken>);
All parameters are available in a Func<Context, ...>
form. This allows you to control the chaos injected:
- in a dynamic manner: by eg driving the chaos from external configuration files
- in a targeted manner: by tagging your policy executions with a
Context.OperationKey
and introducing chaos targeting particular tagged operations
The example app demonstrates both these approaches in practice.
// Executes through the chaos policy directly
chaosPolicy.Execute(() => someMethod());
// Executes through the chaos policy using Context
chaosPolicy.Execute((ctx) => someMethod(), context);
// Wrap the chaos policy inside other Polly resilience policies, using PolicyWrap
var policyWrap = Policy
.Wrap(fallbackPolicy, timeoutPolicy, chaosLatencyPolicy);
policyWrap.Execute(() => someMethod())
// All policies are also available in async forms.
var chaosLatencyPolicy = MonkeyPolicy.InjectLatencyAsync(with =>
with.Latency(TimeSpan.FromSeconds(5))
.InjectionRate(0.1)
.Enabled()
);
var policyWrap = Policy
.WrapAsync(fallbackPolicy, timeoutPolicy, chaosLatencyPolicy);
var result = await policyWrap.ExecuteAsync(token => service.GetFoo(parametersBar, token), myCancellationToken);
// For general information on Polly policy syntax see: https://github.com/App-vNext/Polly
It is usual to place the Simmy policy innermost in a PolicyWrap. By placing the chaos policies innermost, they subvert the usual outbound call at the last minute, substituting their fault or adding extra latency. The existing Polly policies - further out in the PolicyWrap - still apply, so you can test how the Polly resilience you have configured handles the chaos/faults injected by Simmy.
Note: The above examples demonstrate how to execute through a Simmy policy directly, and how to include a Simmy policy in an individual PolicyWrap. If your policies are configured by .NET Core DI at StartUp, for example via HttpClientFactory, there are also patterns which can configure Simmy into your app as a whole, at StartUp. See the Simmy Sample App discussed below.
This Simmy sample app shows different approaches/patterns for how you can configure Simmy to introduce chaos policies in a project. Patterns demonstrated are:
- Configuring
StartUp
so that Simmy chaos policies are only introduced in builds for certain environments (for instance, Dev but not Prod). - Configuring Simmy chaos policies to be injected into the app without changing any existing Polly configuration code.
- Injecting faults or chaos by modifying external configuration.
The patterns shown in the sample app are intended as starting points but are not mandatory. Simmy is very flexible, and we would love to hear how you use it!
All chaos policies (Monkey policies) are designed to inject behavior randomly (faults, latency or custom behavior), so a Monkey policy allows you to specify an injection rate between 0 and 1 (0-100%) thus, the higher is the injection rate the higher is the probability to inject them. Also it allows you to specify whether or not the random injection is enabled, that way you can release/hold (turn on/off) the monkeys regardless of injection rate you specify, it means, if you specify an injection rate of 100% but you tell to the policy that the random injection is disabled, it will do nothing.
See Issues for latest discussions on taking Simmy forward!
Simmy was the brainchild of @mebjas and @reisenberger. The major part of the implementation was by @vany0114 and @mebjas, with contributions also from @reisenberger of the Polly team.
- Simmy, the monkey for making chaos -by Geovanny Alzate Sandoval
- Simmy and Azure App Configuration -by Geovanny Alzate Sandoval
- Simmy Chaos Engine for .NET – Part 1, Injecting Faults -by Bryan Hogan
- Simmy Chaos Engine for .NET – Part 2, Resilience and Injected Faults -by Bryan Hogan
- Chaos Engineering your .NET applications using Simmy -by Joseph Woodward
-
Dylan Reisenberger presents an intentionally simple example .NET Core WebAPI app demonstrating how we can set up Simmy chaos policies for certain environments and without changing any existing configuration code injecting faults or chaos by modifying external configuration.
-
Geovanny Alzate Sandoval made a microservices based sample application to demonstrate how chaos engineering works with Simmy using chaos policies in a distributed system and how we can inject even a custom behavior given our needs or infrastructure, this time injecting custom behavior to generate chaos in our Service Fabric Cluster.
-
Bjørn Einar Bjartnes made a red-green load-testing resilience workshop to understand how errors and resiliency mechanisms affect a system under load. It has been used to run workshops at for example NDC Oslo and there is a video from the workshop at DotNext.