Skip to content

Commit

Permalink
Lauched version 1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
jnluis committed Mar 29, 2024
1 parent 2bdffde commit 5deadb7
Show file tree
Hide file tree
Showing 19 changed files with 1,026 additions and 121 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Action Flow
<br/>
<p align="center">
![architecture diagram](../../../static/img/backend.png)
</p>

<p align="center">
The Backend technologies and their connections.
</p>

---

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
## Table of Contents

- [Action Flow](#action-flow)
- [Table of Contents](#table-of-contents)
- [General Flow Description](#general-flow-description)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

---

## General Flow Description

The **backend** is composed of multiple services and technologies that manage **three internal functions**:
- **Load Balancing**
- **Data Manipulation**
- **Data Persistance**

The **data** is provided in the form of **HTTP requests** created by the **frontend** of the application.

These requests are first intercepted by the **load balancer**, which will distribute all the incoming requests to a set of **API instances**.

These instances will perform the necessary **data processing** and analysis and will provide the response back to the load balancer, which will promptly **foward the response** back to the initial **requester**.

To **persist** and save the **data**, the API instances are connected to a set of **three main databases**.
Each database has a **distinct function**, that being for saving the **data models** (main database), saving the **provided files** (cloud storage) and finally saving the **configurations** of the instances and "magic numbers" (configuration database)
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Load Balancer
<br/>
<p align="center">
![architecture diagram](../../../static/img/nginx.png)
</p>

---

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
## Table of Contents

- [Load Balancer](#load-balancer)
- [Table of Contents](#table-of-contents)
- [NGINX](#nginx)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

---


## NGINX

For the **load balancer**, we chose to use a simple [**NGINX**](https://nginx.org/en/) configuration, that allows us to tune how the **distribution of work** is made between the **API instances** (round-robin, response time, etc) and does not introduce a large processing **overhead** into our system.

Since the system is **user-based**, the backend must provide the responses in a **timely manner**, so any overhead introduced must be reduced to a minimum.

NGINX also allows us to automatically **intercept and filter large files** as soon as the requests carrying them are received in the backend, preventing these files from taking up too much processing power to analyze.

So with NGINX we can:
* **Balance the processing load** more efficiently across our API Instances;
* **Prevent large files** from having to be processed, discarding them as soon as possible;
* **Reduce the overhead** introduced into the system by other large load balancers.
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# API Instances
<br/>
<p align="center">
![architecture diagram](../../../static/img/python_fastapi.png)
</p>

---

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
## Table of Contents

- [API Instances](#api-instances)
- [Table of Contents](#table-of-contents)
- [Why Python and FastAPI](#why-python-and-fastapi)
- [The FastAPI implementation](#the-fastapi-implementation)
- [Endpoints](#endpoints)
- [Models](#models)
- [Repositories](#repositories)
- [Utils](#utils)
- [Templates](#templates)
- [Tests](#tests)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

---

## Why Python and FastAPI

For the API itself, we chose to use [**FastAPI**](https://fastapi.tiangolo.com/).

This choice was heavily influenced by the fact that **all the code must be extremely well documented** and easy to change in the future.

FastAPI also provides **more functionality** than some of its more "barebones" competitors (for example, **flask**), while **omitting most of the less used features** of more complex alternatives (for example, **django**).

The option of creating a **Spring-based API** or choosing a programming language **other than Python** is not possible, as the code itself **must be able to be maintained** by people who may not have the specific knowledge to quickly apply the changes that might be needed in the future.

**Future proofing** also had a large impact on our decision, as all the components of this project must be able to have perhaps even **decades of support**, since if just one of the components used is discontinued, the whole action flow will be disrupted, causing **problems in the whole pipeline**.

Therefore, **Python** and **FastAPI** were chosen because of their **featuresets**, easy **maintainability**, relatively **proven** future, and good **performance**.

---

## The FastAPI implementation

<p align="center">
![architecture diagram](../../../static/img/fastapi.png)
</p>

The final API codebase was divided into **multiple subsections**:

---

### Endpoints

The endpoints consist of all the **accessible functions of the API**, along with their parameters and responses.

All the endpoint's documentation can be found on the **Swagger documentation** on the API itself, at [**localhost:8080/docs**](localhost:8080/docs) (the application must be on in order for this page to work).

The endpoints consist of **CRUD** and **other functions** that operate on the **models** and **local files** that have been inserted into the backend.

---

### Models

The models are the **Python/Database** representation of the **real-life objects** that we are working with, providing a simpler and more clear approach to **data manipulation**.

There are multiple models, for example:
- **User** Model;
- **Dissertation** Model;
- **Notification** Model;
- etc.

Using the [**pydantic**](https://pydantic.dev/) library, all the models automatically apply **data consistency checks**, valid value checks, **default values**, etc for all the parameters and variables stored inside a given model object, allowing for a much **easier and cleaner code** implementation.

---

### Repositories

For **interacting with the main database**, a set of repository actions was implemented that allow the **abstraction** of the database components in the rest of the application.

These repositories apply the necessary database operations to the main Database given a set of predefined parameters, and respond with the semi-processed outcome of the database.

These are essential to **ensure that the database is not overloaded** with bad requests from other functions, that the database threads are used **efficiently** and that access to the database is **easy to alter**.

---

### Utils

The utils are a set of functions that provide **extended functionality** to other parts of the API.

These functions include:
- **Authenticating** users;
- Creating and sending **email notifications**;
- Getting the **configurations** from the Configuration Database;
- Managing **local files** from the Filesystem;
- etc.

---

### Templates

These templates are used by [**Jinja**](https://jinja.palletsprojects.com/en/3.1.x/) and the API to **generate simple HTML code**.

The generated file can then be sent to an **email notification** or used to **list all the data collected** throughout the year (useful for collecting information from the previous year for backups).

---

### Tests

The test portion is still to be implemented.
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Databases
<br/>
<p align="center">
![architecture diagram](../../../static/img/databases_nobg.png)
</p>

<p align="center">
The database technologies utilized
</p>

---

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
## Table of Contents

- [Databases](#databases)
- [Table of Contents](#table-of-contents)
- [Databases](#databases-1)
- [Main Database](#main-database)
- [File Storage](#file-storage)
- [Configuration Database](#configuration-database)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

---

## Databases

For better **separation, maintainability, scalability and performance**, the persistence of data was divided into **three separate databases**.

This allows the better utilization of each database's **core strengths** as they are needed in the multitude of tasks that the **API Instances** are required to perform.

---

### Main Database

<p align="center">
![architecture diagram](../../../static/img/mongodb.png)
</p>

The **models** mentioned previously as well as most of the data processed are **saved in the Main Database**.

Due to the **volatile** and **highly customizable** nature of these models, a database that could easily be adapted to new model formats was necessary.

This means that the typical SQL databases are **not adequate** for our type of data structures, so a **[NoSQL](https://en.wikipedia.org/wiki/NoSQL) database was chosen instead**.

Most NoSQL databases are much more recent than the SQL databases we are used to, and since all the components in this project must have certified **long term support**, the **[MongoDB](https://www.mongodb.com/) database was chosen**.

MongoDB allows the **models to be changed without affecting the compatibility of old data models**, meaning that changes made to the database will still always allow for old data to be processed along with new data.

Another key feature that drove our decision is the ability to easily convert a **mongoDB document into a JSON object** that could easily be natively worked on inside our python codebase, **without the need for complicated operations** such as joins, aggregations, etc...

---

### File Storage

<p align="center">
![architecture diagram](../../../static/img/cloud_storage.png)
</p>

**Storing large files** (ex: PDF files from dissertations or dissertation logos) would take a lot of **I/O throughput** away from the rest of the Main Database as well as use a lot of the machine's available **storage space**.

This means that another way of storing large singular files had to be chosen.

For this, we used a **cloud storage system** that is included inside the machine's filesystem, which is then linked inside the necessary **docker volumes**.

This allows the asynchronous storage and retrieval of files **without impacting the Main Database's** throughput capacity or performance.

---

### Configuration Database

<p align="center">
![architecture diagram](../../../static/img/mariadb.png)
</p>

Since **there can be a lot of API Instances running at any given time**, changes made to the configuration must be made in such a way that all the instances must pick up the new changes as soon as they use them.

So for **configurations and "magic numbers"**, we had to implement a place where all the instances could reach and get the necessary values.

To do this, we chose to add a **[MariaDB](https://mariadb.org/) implementation** so that we did not have to worry about **concurrency** between all the instances and the data changes, as well as allowing all the configuration data to be **stored in the same, easy to edit file**.
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Docker And Kubernetes
<br/>
<p align="center">
![architecture diagram](../../../static/img/docker_all.png)
</p>

<p align="center">
The containarization technologies utilized
</p>

---

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
## Table of Contents

- [Docker And Kubernetes](#docker-and-kubernetes)
- [Table of Contents](#table-of-contents)
- [Docker Compose](#docker-compose)
- [Docker Containers](#docker-containers)
- [Docker Networks](#docker-networks)
- [Docker Volumes](#docker-volumes)
- [Kubernetes](#kubernetes)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

---

## Docker Compose

<p align="center">
![architecture diagram](../../../static/img/docker-compose.png)
</p>

[Docker Compose](https://docs.docker.com/compose/) was used as the primary **container orchestrator**.

Docker Compose allows us to define all the necessary deployment parameters, such as **ports**, **volumes** and image **versions**.

It also allows us to **manage**, **launch** and **stop** all the containers at once.

---

### Docker Containers

<p align="center">
![architecture diagram](../../../static/img/docker.png)
</p>

For the dockerization of our application, the following containers were defined:
- **Application** (frontend)
- **Api Instance** (multiple can be created at once)
- **Load Balancer**
- **MongoDB**
- **MariaDB**

---

### Docker Networks

Another advantage of docker compose is the ability to create multiple **separate networks between containers**.

In our application the following networks were created:
- **frontend_network** (App ⇄ Frontend Server)
- **backend_network** (API Instances ⇄ Databases)
- **request_network** (Load Balancer ⇄ API Instances)

---

### Docker Volumes

Both MongoDB and the Cloud Service require the use of volumes, and the Cloud Service's volume must be **mapped to the host's filesystem** (bind volume).

In our application the following volumes were created:
- **mongodbdata** (MongoDB Data)
- **file-bind** (Bind volume that utilizes the host's filesystem, which is then managed by the cloud server)

---

## Kubernetes

<p align="center">
![architecture diagram](../../../static/img/kubernetes.png)
</p>

A [Kubernetes](https://kubernetes.io/) implementation was **partially implemented**, but would require more work to be put into it in order to be production ready.

The specific kubernetes distribuition used was the simpler [K3s](https://k3s.io/), providing all the required functionality at a smaller cost of setup time.

Since the final production environment will only consist of a **single server**, the kubernete's main feature of load balancing between machines would not be of any use.

The ability to **detect crashes on containers** and **relaunch the affected service(s)** is very useful to our application, but more discussion needs to be had in order to determine if this is a priority for our system.

The final production version will include a **kubernetes file** which will launch the containers and do the appropriate load balancing, but more **testing must be done** in order to apply the changes to production.
9 changes: 9 additions & 0 deletions versioned_docs/version-1.2/The Platform/1-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Architecture

## Architecture Diagram

A simple diagram to illustrate our architecture

<p align="center">
![architecture diagram](../../../static/img/architecture.png)
</p>
9 changes: 9 additions & 0 deletions versioned_docs/version-1.2/The Platform/2-Deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Deployment

## Deployment and Structural Diagram

A simple deployment diagram and infrastructure mapping

<p align="center">
![architecture diagram](../../../static/img/deployment.png)
</p>
Loading

0 comments on commit 5deadb7

Please sign in to comment.