Skip to content

Latest commit

 

History

History
120 lines (97 loc) · 5.29 KB

README.md

File metadata and controls

120 lines (97 loc) · 5.29 KB

Distributed Signup

Documentation Go Report Card

This project demonstrates a partitioned signup flow based on a primer from "Designing Data-Intensive Applications" book by Martin Kleppmann @martinkl (Thank you!): users sign up at Account service which requires a username. There are so many people willing to register, that a single PostgreSQL database can't hold all account records, but three servers are enough for this hypothetical service load. Therefore we should split (partition) user accounts on three databases and make sure a username is unique across all of them.

The idea is to write signup requests into account.signup_request Kafka topic which is partitioned by username. Hence all attempts to claim username Bob will be stored in the same Kafka partition based on consistent hashing algorithm. For example, {username: Bob, request_id: 13rUw7cUfrGO9Go9xbZearzuuAu} message is written to hash('Bob') % partitions_count partition. Since we have three PostgreSQL instances, we need to split account.signup_request topic into three partitions (0, 1, 2).

Each signup-server process sequentially reads Kafka messages from its own partition and stores user accounts in its PostgreSQL database. If Bob username exists in Postgres, the program emits a failure message to account.signup_response topic. Otherwise, Bob's account is created and success message is written to the topic. For example:

  • {username: Bob, success: false, request_id: 13rVCgpmD0UgKH6zNHdfcPG63Df}
  • {username: Bob, success: true, request_id: 13rUw7cUfrGO9Go9xbZearzuuAu}

Note, request_id is generated by a client who sends signup requests. Request IDs are needed to deduplicate messages. IDs are kept for a certain duration (until a message ages out) or limited by storage size. I have not tried deduplication in this project, although I was curious what storage will be the way to go. For instance, Segment shared how they leverage RocksDB in Delivering Billions of Messages Exactly Once, while CockroachDB uses RocksDB as a Storage Layer. Now I know what to try next!

A word about artificial keys in PostgreSQL. UUID v4 is a common choice to generate a random unique ID for an entity, e.g., invoice ID. Indexing of highly randomized values cause write amplification, so INSERTs become slow. In SQL Keys in Depth the author shows the superior performance of UUID v1 algorithm which produces node MAC address + timestamp monotonically increasing values. In this demo I used K-Sortable Unique IDentifier (timestamp + randomly generated payload) to assign user IDs in PostgreSQL. Segment goes into KSUID details in A Brief History of the UUID.

Get Started

Let's run three PostgreSQL docker containers on 5433, 5434, 5435 ports with account dbs created. We also need Kafka which will have account.signup_request and account.signup_response topics with 3 partitions and 1 replica. Docker Compose will take care of that. The only caveat is that you should set KAFKA_ADVERTISED_HOST_NAME.

$ cd ./docker/
$ KAFKA_ADVERTISED_HOST_NAME=$(ipconfig getifaddr en0) docker-compose up

Install dependencies using dep package manager and build all commands.

$ dep ensure
$ make build

Create PostgreSQL schema in every db with schema command.

$ ./schema -pgport=5433 && ./schema -pgport=5434 && ./schema -pgport=5435

Run three signup-server for each account.signup_request partition to process signup requests.

$ ./signup-server -partition=0 -pgport=5433
$ ./signup-server -partition=1 -pgport=5434
$ ./signup-server -partition=2 -pgport=5435

Finally, run signup-ctl and type usernames to send signup requests. Note, both programs have a debug mode to show more logs.

$ ./signup-ctl
bob
2:0 13rUw7cUfrGO9Go9xbZearzuuAu bob ✅
alice
2:1 13rUwm0PI5tMT3FEx4OwW905yWw alice ✅
john
2:2 13rUyyeODTy1GDdvRhtgLjC5sbG john ✅
lloyd
0:0 13rV46Yp6Ng0uEPuUmsF51S5pi2 lloyd ✅
aaron
0:1 13rV4lXEoSQrcSpaFYamWMBaDWt aaron ✅
peter
1:0 13rVCAFeRJxK671227gtFSq069F peter ✅
bob
2:3 13rVCgpmD0UgKH6zNHdfcPG63Df bob ❌
lloyd
0:2 13rVEyTTAh7P76aKQ3ZEomDEzqX lloyd ❌
sam
2:4 13rVFmwyaw2u5UXXNKIKMplycqb sam ✅

Signup responses are printed in partition_id:offset request_id username format. As you can see, bob successfully registered and the attempt to sign up as bob again failed.

2:0 13rUw7cUfrGO9Go9xbZearzuuAu bob ✅
...
2:3 13rVCgpmD0UgKH6zNHdfcPG63Df bob ❌

Testing

To run tests you will need Postgres and test env variables set up.

$ make docker_run_postgres
$ make test