This project is a backend oriented demo with websocket capabilities that relies heavily in the Actor Model Framework implementation of Microsoft Orleans. I gave myself the luxury of experimenting with some technologies and practices that might be useful on a real world distributed system.
Click to view it on Loom with sound.
The domain contains two aggregate roots: Teams
and Tournaments
.
A Team
contains a name, a list of players (strings) and a list of tournament IDs on which the team is registered.
A Tournament
contains a name, a list of team IDs participating on it, and a Fixture
that contains information of each different Phase
: quarter finals, semi finals and finals.
For a Tournament
to start, the list of teams must have a count of 8. When it starts, the participating teams are shuffled randomly, and the Fixture
's quarter Phase
is populated.
A Phase
is a container for a list of Match
es that belong to the bracket. Each Match
contains the LocalTeamId
, the AwayTeamId
and a MatchResult
. In order not to use null values, the MatchResult
is created with default values of 0 goals for each team and the property Played
set to false.
When the user intents the command SetMatchResult
the correct result will be assigned to the corresponding Match
on the corresponding Phase
. When all the Match
es on a Phase
are played, the next Phase
is generated. Meanwhile the quarter finals are generated by a random shuffle of the participating teams, the semifinals and the finals are not generated randomly but considering the results of previous brackets.
In Actor Model Framework, each Actor (instance of an aggregate root) contains a state that mutates when applying events. Orleans is not different, and each Grain (Actor) acts the same way.
- Each
TournamentGrain
contains aTournamentState
. - When a
TournamentGrain
receives aCommand
, it first checks if the command is valid business wise. - If the
Command
is valid, it will publish anEvent
informing what happened and modifying theState
accordingly.
The TournamentState
is not coupled to Orleans framework. This means that the class is just a plain one that exposes methods that acts upon certain events. This allows for replayability of events and event sourcing as a whole. The state is then mutable, but that does not mean that the properties exposed by it are mutable as well. For example the TournamentState
contains the Fixture
.
The Fixture
is a value object implemented with C# records. This means that the methods exposed by it does not cause side effects and instead returns a new Fixture
with the changes reflected. This enables an easier testability and predictability as all the methods are deterministic; you can execute a method a hundred times and the result is the same.
One of the advantages of using an Actor Model Framework is the innate segregation of commands and queries. The commands are executed against a certain Grain and causes side effects, such as publishing an Event. However, the queries do not go against a Grain, instead they go against a read database that gets populated via projections.
This is, in fact, eventual consistency. There are some milliseconds between the source of truth changes (Grain) are reflected in the projections which one can query. This should not be a problem, but one needs to make sure to enable retry strategies in case the database is down while consuming an event from the stream, etc.
In the scenario of a Grain being killed because of inactivity, or the Silo resetting and losing the in memory state of the Grain; each Grain can be recovered by applying the events in order:
TournamentGrain
is not on memory.- Initialize the
TournamentGrain
by replaying the Events stored in the write database. - Executes the
SetMatchResult
command, etc.
As you can see, the Grain will never use the read state as the source of truth, and instead it will rely on the event sourcing mechanism, to apply all the events again one after another until the state is up to date.
Saga
is a pattern for a distributed transaction that impacts more than one aggregate. In this case, lets look at the scenario when a Team
wants to join a Tournament
.
TeamGrain
receives the commandCreateTeam
and theTeamState
is initialized.TournamentGrain
receives the commandAddTeam
for the team created above, validations kick in:- Does the team exist? (Note: here you are not supposed to query your read database, as it not the source of truth, but actually send a message to the
TeamGrain
). - Does the tournament contain less than 8 teams?
- Is the tournament already started?
- Does the team exist? (Note: here you are not supposed to query your read database, as it not the source of truth, but actually send a message to the
- If validations are ok, publishes the
TeamAdded
event.
So far, the TournamentGrain
is aware of the Team joining, but the TeamGrain
is not aware of the participation on the Tournament. Enter the TeamAddedSaga
Grain.
TeamAddedSaga
subscribes implicitly to an Orleans Stream looking forTeamAdded
events.- When it receives an event, it gets a reference for the
TeamGrain
with the corresponding ID. - Sends the
JoinTournament
command to theTeamGrain
. - There are no extra validations needed, as everything was already validated before.
TeamGrain
publishes theTeamJoinedTournament
event.
In this case there is not a rollback strategy implemented but these capabilities can definitely be handled with an "intermediate" Grain such as the Saga.
Given the async nature of Orleans, the commands executed by a user should not return results on how does the resource looks after a command succeeded, not even IF the command succeeded.
Instead the response will contain a TraceId
as a GUID representation on the intent of the user. The user will receive a message via websockets indicating what happened for that TraceId
. The frontend can reflect the changes accordingly.
For the graph above the work, the user should establish a websocket connection with the API. The user will only get the results for those commands he/she invoked and not for everything happening in the system.
As not all users should receive the results for all the commands, there is a need for distinguishing users. For this purpose a super simple authentication and authorization mechanism is implemented. If you need to implement auth in a real world app, please do not reinvent the wheel and check existing solutions such as AD or Okta.
As mentioned before, each command executed will result on a TraceId
, but internally there is also an InvokerUserId
propagated that can also serve as an audit log for each event.
The user will only get websocket messages for those events on which the InvokerUserId
matches with the one on the JWT Token.
Fallback mechanism
It makes sense to have a fallback mechanism in case a websocket event did not arrive due to network issues. This could be a key value database where a user can search for a TraceId
to find the response in order to react to a command executed. This is not currently implemented but will be added later.
The solution consists on different projects that need to be executed at the same time:
- API.Identity: Create user and generate token.
- API: Entrypoint for the user to invoke commands, connects to the Orleans cluster as a client.
- Silo: Orleans cluster, handles everything related to Orleans such as placement, streams, etc.
- Silo.Dashboard: Orleans client that displays metrics and information.
It also requires an instance of Postgres to be running and accepting connections.
Taking in consideration that the API could be split in smaller pieces and the Silos count being scalable, I decided to create Kubernetes manifests to host this application. As it is not a common practice to have your own Kubernetes cluster locally, I also decided on using Kind
which uses the docker engine to host a single node of Kubernetes locally.
I am not going to enter into details on how this works, but just so you know that Docker
and Kind
are required to be installed to run this solution. Of course, one can choose to run locally instead, bear in mind that in that scenario you need to be able to connect to a Postgres instance as mentioned above.
All the Kubernetes configuration files can be found on the kubernetes
folder. NOTE: There are also "secrets" on the mentioned folder, for a real world application please do not store your secrets in plain text on your repository.
As mentioned before, it is required to have docker
installed as well as kind
command line and kubectl
. It is also suggested to have make
support, so you don't have to manually go through the commands one by one.
Initial bootstrap:
make run
Trigger rebuild of images and recreation of all the pods:
make restart
If you see something like this, everything should be up and running:
Use
kubectl
instead ofkb
as it is an alias that I am used to.
Now you can access with the following endpoints:
You can use the Postman collection below to try out the functionality as you can see on the demo video. This collection assigns values from previous responses so you don't have to modify the requests manually but just execute them in order.
As this client is not containerized, dotnet
sdk is required. After creating a user and getting the token, you can execute on the utils/Websockets.Client
directory:
dotnet run
Paste your user token, and when choosing an environment type K
and press enter key. A message saying connected should appear. Now you should be able to get the results of all the requests fired through the API using that user token.
Tests are not yet implemented, but you can see a simple example on the Domain.Tests
project. I would like the unit tests just to cover the domain functionality:
- Given a state, apply an event, assert the modified state.
- Given a value object, invoke a method, assert the returned record.