Lauched version 1.2

pi-dsd · Mar 29, 2024 · 5deadb7 · 5deadb7
1 parent 2bdffde
commit 5deadb7
Show file tree

Hide file tree

Showing 19 changed files with 1,026 additions and 121 deletions.
diff --git a/versioned_docs/version-1.2/The Backend Implementation/1-action_flow.md b/versioned_docs/version-1.2/The Backend Implementation/1-action_flow.md
@@ -0,0 +1,39 @@
+# Action Flow
+<br/>
+<p align="center">
+    ![architecture diagram](../../../static/img/backend.png)
+</p>
+
+<p align="center">
+  The Backend technologies and their connections.
+</p>
+
+---
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+## Table of Contents
+
+- [Action Flow](#action-flow)
+  - [Table of Contents](#table-of-contents)
+  - [General Flow Description](#general-flow-description)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+---
+
+## General Flow Description
+
+The **backend** is composed of multiple services and technologies that manage **three internal functions**:
+    - **Load Balancing**
+    - **Data Manipulation**
+    - **Data Persistance**
+
+ The **data** is provided in the form of **HTTP requests** created by the **frontend** of the application.
+
+ These requests are first intercepted by the **load balancer**, which will distribute all the incoming requests to a set of **API instances**.
+
+ These instances will perform the necessary **data processing** and analysis and will provide the response back to the load balancer, which will promptly **foward the response** back to the initial **requester**.
+
+ To **persist** and save the **data**, the API instances are connected to a set of **three main databases**.
+ Each database has a **distinct function**, that being for saving the **data models** (main database), saving the **provided files** (cloud storage) and finally saving the **configurations** of the instances and "magic numbers" (configuration database)
diff --git a/versioned_docs/version-1.2/The Backend Implementation/2-load_balancer.md b/versioned_docs/version-1.2/The Backend Implementation/2-load_balancer.md
@@ -0,0 +1,33 @@
+# Load Balancer
+<br/>
+<p align="center">
+    ![architecture diagram](../../../static/img/nginx.png)
+</p>
+
+---
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+## Table of Contents
+
+- [Load Balancer](#load-balancer)
+  - [Table of Contents](#table-of-contents)
+  - [NGINX](#nginx)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+---
+
+
+## NGINX
+
+ For the **load balancer**, we chose to use a simple [**NGINX**](https://nginx.org/en/) configuration, that allows us to tune how the **distribution of work** is made between the **API instances** (round-robin, response time, etc) and does not introduce a large processing **overhead** into our system.
+
+ Since the system is **user-based**, the backend must provide the responses in a **timely manner**, so any overhead introduced must be reduced to a minimum.
+
+ NGINX also allows us to automatically **intercept and filter large files** as soon as the requests carrying them are received in the backend, preventing these files from taking up too much processing power to analyze.
+
+So with NGINX we can:
+* **Balance the processing load** more efficiently across our API Instances;
+* **Prevent large files** from having to be processed, discarding them as soon as possible;
+* **Reduce the overhead** introduced into the system by other large load balancers.
diff --git a/versioned_docs/version-1.2/The Backend Implementation/3-api_Instances.md b/versioned_docs/version-1.2/The Backend Implementation/3-api_Instances.md
@@ -0,0 +1,111 @@
+# API Instances
+<br/>
+<p align="center">
+    ![architecture diagram](../../../static/img/python_fastapi.png)
+</p>
+
+---
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+## Table of Contents
+
+- [API Instances](#api-instances)
+  - [Table of Contents](#table-of-contents)
+  - [Why Python and FastAPI](#why-python-and-fastapi)
+  - [The FastAPI implementation](#the-fastapi-implementation)
+    - [Endpoints](#endpoints)
+    - [Models](#models)
+    - [Repositories](#repositories)
+    - [Utils](#utils)
+    - [Templates](#templates)
+    - [Tests](#tests)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+---
+
+## Why Python and FastAPI
+
+ For the API itself, we chose to use [**FastAPI**](https://fastapi.tiangolo.com/).
+
+ This choice was heavily influenced by the fact that **all the code must be extremely well documented** and easy to change in the future.
+
+ FastAPI also provides **more functionality** than some of its more "barebones" competitors (for example, **flask**), while **omitting most of the less used features** of more complex alternatives (for example, **django**).
+
+ The option of creating a **Spring-based API** or choosing a programming language **other than Python** is not possible, as the code itself **must be able to be maintained** by people who may not have the specific knowledge to quickly apply the changes that might be needed in the future.
+
+ **Future proofing** also had a large impact on our decision, as all the components of this project must be able to have perhaps even **decades of support**, since if just one of the components used is discontinued, the whole action flow will be disrupted, causing **problems in the whole pipeline**.
+
+ Therefore, **Python** and **FastAPI** were chosen because of their **featuresets**, easy **maintainability**, relatively **proven** future, and good **performance**.
+
+---
+
+## The FastAPI implementation
+
+<p align="center">
+    ![architecture diagram](../../../static/img/fastapi.png)
+</p>
+
+ The final API codebase was divided into **multiple subsections**:
+
+---
+
+### Endpoints
+
+ The endpoints consist of all the **accessible functions of the API**, along with their parameters and responses.
+
+ All the endpoint's documentation can be found on the **Swagger documentation** on the API itself, at [**localhost:8080/docs**](localhost:8080/docs) (the application must be on in order for this page to work).
+
+ The endpoints consist of **CRUD** and **other functions** that operate on the **models** and **local files** that have been inserted into the backend.
+
+---
+
+### Models
+
+ The models are the **Python/Database** representation of the **real-life objects** that we are working with, providing a simpler and more clear approach to **data manipulation**.
+
+ There are multiple models, for example:
+  - **User** Model;
+  - **Dissertation** Model;
+  - **Notification** Model;
+  - etc.
+
+ Using the [**pydantic**](https://pydantic.dev/) library, all the models automatically apply **data consistency checks**, valid value checks, **default values**, etc for all the parameters and variables stored inside a given model object, allowing for a much **easier and cleaner code** implementation. 
+
+---
+
+### Repositories
+
+ For **interacting with the main database**, a set of repository actions was implemented that allow the **abstraction** of the database components in the rest of the application.
+
+ These repositories apply the necessary database operations to the main Database given a set of predefined parameters, and respond with the semi-processed outcome of the database.
+
+ These are essential to **ensure that the database is not overloaded** with bad requests from other functions, that the database threads are used **efficiently** and that access to the database is **easy to alter**.
+
+---
+
+### Utils
+
+ The utils are a set of functions that provide **extended functionality** to other parts of the API.
+
+ These functions include:
+  - **Authenticating** users;
+  - Creating and sending **email notifications**;
+  - Getting the **configurations** from the Configuration Database;
+  - Managing **local files** from the Filesystem;
+  - etc.
+
+---
+
+### Templates
+
+ These templates are used by [**Jinja**](https://jinja.palletsprojects.com/en/3.1.x/) and the API to **generate simple HTML code**.
+
+ The generated file can then be sent to an **email notification** or used to **list all the data collected** throughout the year (useful for collecting information from the previous year for backups).
+
+---
+
+### Tests
+
+ The test portion is still to be implemented.
diff --git a/versioned_docs/version-1.2/The Backend Implementation/4-databases.md b/versioned_docs/version-1.2/The Backend Implementation/4-databases.md
@@ -0,0 +1,82 @@
+# Databases
+<br/>
+<p align="center">
+    ![architecture diagram](../../../static/img/databases_nobg.png)
+</p>
+
+<p align="center">
+  The database technologies utilized
+</p>
+
+---
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+## Table of Contents
+
+- [Databases](#databases)
+  - [Table of Contents](#table-of-contents)
+  - [Databases](#databases-1)
+    - [Main Database](#main-database)
+    - [File Storage](#file-storage)
+    - [Configuration Database](#configuration-database)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+---
+
+## Databases
+
+For better **separation, maintainability, scalability and performance**, the persistence of data was divided into **three separate databases**.
+
+This allows the better utilization of each database's **core strengths** as they are needed in the multitude of tasks that the **API Instances** are required to perform.
+
+---
+
+### Main Database
+
+  <p align="center">
+      ![architecture diagram](../../../static/img/mongodb.png)
+  </p>
+
+ The **models** mentioned previously as well as most of the data processed are **saved in the Main Database**.
+
+ Due to the **volatile** and **highly customizable** nature of these models, a database that could easily be adapted to new model formats was necessary.
+
+ This means that the typical SQL databases are **not adequate** for our type of data structures, so a **[NoSQL](https://en.wikipedia.org/wiki/NoSQL) database was chosen instead**.
+
+ Most NoSQL databases are much more recent than the SQL databases we are used to, and since all the components in this project must have certified **long term support**, the **[MongoDB](https://www.mongodb.com/) database was chosen**.
+
+ MongoDB allows the **models to be changed without affecting the compatibility of old data models**, meaning that changes made to the database will still always allow for old data to be processed along with new data.
+
+ Another key feature that drove our decision is the ability to easily convert a **mongoDB document into a JSON object** that could easily be natively worked on inside our python codebase, **without the need for complicated operations** such as joins, aggregations, etc... 
+
+---
+
+### File Storage 
+
+  <p align="center">
+      ![architecture diagram](../../../static/img/cloud_storage.png)
+  </p>
+
+ **Storing large files** (ex: PDF files from dissertations or dissertation logos) would take a lot of **I/O throughput** away from the rest of the Main Database as well as use a lot of the machine's available **storage space**.
+
+ This means that another way of storing large singular files had to be chosen.
+
+ For this, we used a **cloud storage system** that is included inside the machine's filesystem, which is then linked inside the necessary **docker volumes**.
+
+ This allows the asynchronous storage and retrieval of files **without impacting the Main Database's** throughput capacity or performance.
+
+---
+
+### Configuration Database 
+
+  <p align="center">
+      ![architecture diagram](../../../static/img/mariadb.png)
+  </p>
+
+ Since **there can be a lot of API Instances running at any given time**, changes made to the configuration must be made in such a way that all the instances must pick up the new changes as soon as they use them. 
+
+ So for **configurations and "magic numbers"**, we had to implement a place where all the instances could reach and get the necessary values.
+
+ To do this, we chose to add a **[MariaDB](https://mariadb.org/) implementation** so that we did not have to worry about **concurrency** between all the instances and the data changes, as well as allowing all the configuration data to be **stored in the same, easy to edit file**.
diff --git a/versioned_docs/version-1.2/The Backend Implementation/5-docker_kubernetes.md b/versioned_docs/version-1.2/The Backend Implementation/5-docker_kubernetes.md
@@ -0,0 +1,93 @@
+# Docker And Kubernetes
+<br/>
+<p align="center">
+    ![architecture diagram](../../../static/img/docker_all.png)
+</p>
+
+<p align="center">
+  The containarization technologies utilized
+</p>
+
+---
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+## Table of Contents
+
+- [Docker And Kubernetes](#docker-and-kubernetes)
+  - [Table of Contents](#table-of-contents)
+  - [Docker Compose](#docker-compose)
+    - [Docker Containers](#docker-containers)
+    - [Docker Networks](#docker-networks)
+    - [Docker Volumes](#docker-volumes)
+  - [Kubernetes](#kubernetes)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+---
+
+## Docker Compose
+
+<p align="center">
+    ![architecture diagram](../../../static/img/docker-compose.png)
+</p>
+
+ [Docker Compose](https://docs.docker.com/compose/) was used as the primary **container orchestrator**.
+
+ Docker Compose allows us to define all the necessary deployment parameters, such as **ports**, **volumes** and image **versions**.
+
+ It also allows us to **manage**, **launch** and **stop** all the containers at once.
+
+---
+
+### Docker Containers
+
+<p align="center">
+    ![architecture diagram](../../../static/img/docker.png)
+</p>
+
+ For the dockerization of our application, the following containers were defined:
+  - **Application** (frontend)
+  - **Api Instance** (multiple can be created at once)
+  - **Load Balancer**
+  - **MongoDB**
+  - **MariaDB**
+
+---
+
+### Docker Networks
+
+ Another advantage of docker compose is the ability to create multiple **separate networks between containers**.
+
+ In our application the following networks were created:
+  - **frontend_network** (App ⇄ Frontend Server)
+  - **backend_network** (API Instances ⇄ Databases)
+  - **request_network** (Load Balancer ⇄ API Instances)
+
+---
+
+### Docker Volumes
+
+ Both MongoDB and the Cloud Service require the use of volumes, and the Cloud Service's volume must be **mapped to the host's filesystem** (bind volume).
+
+ In our application the following volumes were created:
+  - **mongodbdata** (MongoDB Data)
+  - **file-bind** (Bind volume that utilizes the host's filesystem, which is then managed by the cloud server)
+
+---
+
+## Kubernetes
+
+<p align="center">
+    ![architecture diagram](../../../static/img/kubernetes.png)
+</p>
+
+ A [Kubernetes](https://kubernetes.io/) implementation was **partially implemented**, but would require more work to be put into it in order to be production ready.
+
+ The specific kubernetes distribuition used was the simpler [K3s](https://k3s.io/), providing all the required functionality at a smaller cost of setup time.
+
+ Since the final production environment will only consist of a **single server**, the kubernete's main feature of load balancing between machines would not be of any use.
+
+ The ability to **detect crashes on containers** and **relaunch the affected service(s)** is very useful to our application, but more discussion needs to be had in order to determine if this is a priority for our system.
+
+ The final production version will include a **kubernetes file** which will launch the containers and do the appropriate load balancing, but more **testing must be done** in order to apply the changes to production. 
diff --git a/versioned_docs/version-1.2/The Platform/1-architecture.md b/versioned_docs/version-1.2/The Platform/1-architecture.md
@@ -0,0 +1,9 @@
+# Architecture
+
+## Architecture Diagram
+
+A simple diagram to illustrate our architecture
+
+<p align="center">
+    ![architecture diagram](../../../static/img/architecture.png)
+</p>
diff --git a/versioned_docs/version-1.2/The Platform/2-Deployment.md b/versioned_docs/version-1.2/The Platform/2-Deployment.md
@@ -0,0 +1,9 @@
+# Deployment
+
+## Deployment and Structural Diagram
+
+A simple deployment diagram and infrastructure mapping 
+
+<p align="center">
+    ![architecture diagram](../../../static/img/deployment.png)
+</p>