Skip to content

Discrete event simulation of a cluster of machines with .NET Core async/await

License

Notifications You must be signed in to change notification settings

abdullin/sim-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sim-cluster

This is work-in-progress on running event-driven distributed systems inside discrete event simulation.

The purpose of this simulation research - to be able to run a distributed application inside a deterministic simulation while bombarding it with various faults that are hard to reproduce in the real world (but are still disruptive for the production systems).

This project builds upon the SymAsync project, extending it with more features (including a simplified networking stack).

If you want to discuss this project, don't hesitate to write me an email at rinat@abdullin.

Try it out

This is a .NET Core 2.0. You should be able to open it in a IDE (e.g. in JetBrains Rider) and run Runtime/SimMach.csproj project.

The output should be something like this:

screenshot.png

Alternatively, you could try launching everything from the CLI with something like:

$ dotnet run --project Runtime

Details

Sim-cluster builds up on the previous work:

  1. SimCPU - simulate CPU job scheduler (easier than it sounds);
  2. SimRing - simulate ring benchmark;
  3. SimAsync - plug into .NET Core async/await to simulate processes running in parallel;
  4. SimCluster - this.

This project introduces:

  • Simplified simulation of TCP/IP. This includes connection handshake, SEQ/ACK numbers and reorder buffers. There is now proper shutdown sequence and no packet re-transmissions.
  • Durable node storage in form of per-machine folders used by the LMDB database.
  • Configurable system topology - machines, services and network connections.
  • Simulation plans that specify how we want to run the simulated topology. This includes a graceful chaos monkey.
  • Simulating power outages by erasing future for the affected systems.
  • Network profiles - ability to configure latency, packet loss ratio and logging per network connection.

Dive in

To dive in take a look at the Program.cs. It generates a simulation scenario that is then executed.

A scenario could look like this:

public static ScenarioDef InventoryMoverBotOver3GConnection() {
    var test = new ScenarioDef();
    // define network connections and provide network profiles for them
    test.Connect("botnet", "public", NetworkProfile.Mobile3G);
    test.Connect("public", "internal", NetworkProfile.AzureIntranet);
    // install services on the machines
    test.AddService("cl.internal", InstallCommitLog);
    test.AddService("api1.public", InstallBackend("cl.internal"));
    test.AddService("api2.public", InstallBackend("cl.internal"));
    // configure a bot that will create workload and verify results 
    var mover = new InventoryMoverBot {
        Servers = new []{"api1.public", "api2.public"},
        RingSize = 7,
        Iterations = 30,
        Delay = 4.Sec(),
        HaltOnCompletion = true
    };
    
    test.AddBot(mover);
    
    // define a plan for the simulation (who will control the machines)
    // this is optional, but a chaos monkey is cute...
    var monkey = new GracefulChaosMonkey {
        ApplyToMachines = s => s.StartsWith("api"),
        DelayBetweenStrikes = r => r.Next(5,10).Sec()
    };
    test.Plan = monkey.Run;
    return test;
}

Installer functions bring together the necessary dependencies and return an instance of IEngine:

static Func<IEnv, IEngine> InstallBackend(string cl) {
    return env => {
        var client = new CommitLogClient(env, cl + ":443");
        return new BackendServer(env, 443, client);
    };
}
static IEngine InstallCommitLog(IEnv env) {
    return new CommitLogServer(env, 443);
}

BackendServer is a simplistic event-driven server that has its own projection thread and a (command) request handler. It commits data to the CommitLog from which other server instances could get the same data.

In theory, the same business logic should be able to run in the real world environment as well. I didn't get to that part, yet.

Licenses

This project is licensed under MIT license and uses:

About

Discrete event simulation of a cluster of machines with .NET Core async/await

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages