Skip to content

Commit

Permalink
Update AEP
Browse files Browse the repository at this point in the history
  • Loading branch information
tschneider-aneo committed Oct 1, 2024
1 parent 66dce52 commit 87a91d5
Showing 1 changed file with 15 additions and 15 deletions.
30 changes: 15 additions & 15 deletions AEP/aep-00004.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ To better envision how a sharded architecture changes from a replica set archite
## Simple standalone architecture
![standalone schema](./images/standalone.png)

In a standalone architecture, only one MongoDB instance (mongod process) is available for all operations. The database user directly reaches that instance. If this instance somehow becomes unavailable data cannot be accessed.
In a standalone architecture, only one MongoDB instance (mongod process) is available for all operations. The database user directly reaches directly that instance. If that instance somehow becomes unavailable data cannot be accessed.

## Single replica set architecture
![standalone schema](./images/replicaset.png)
Expand Down Expand Up @@ -68,11 +68,11 @@ Finally, also note that multiple mongos routers can be deployed in the cluster t
Sharding the database would alleviate the tension on all the nodes. For the primary nodes, the write operations would be distributed among other shards' primary node, and for the secondary nodes there can be some situations where only the right shard is queried (it will be explicited below in this AEP) and as mentionned above, the read operations are faster on a sharded cluster since the data is distributed among multiple instances.

This sharded (or not) MongoDB deployment would be defined in example our infrastructure code as a Helm release resource (called by Terraform). A Helm release is an instance of a Helm chart. Helm charts are templates that allow to deploy resources within a Kubernetes cluster in configurable way.
Using Helm charts conveys the complexity of defining a deployment towards the authors of a Helm chart, in accordance with the "Don't reinvent the wheel" principle.
Using Helm charts delegates the complexity of defining a deployment towards the authors of a Helm chart, in accordance with the "Don't reinvent the wheel" principle.

Instead of manually defining all the Kubernetes resources and their configuration for a MongoDB deployment within Terraform modules as it is currently done with ArmoniK's example infrastructure, Helm charts allow us to focus only on passing the good configuration for our MongoDB deployment, which is already not a simple task. Nevertheless, it is possible that some Kubernetes resources or bash scripts will still need to be manually defined inside Terraform for configuration purposes. Helm charts can offer that possibility.
Instead of manually defining all the Kubernetes resources and their configuration for a MongoDB deployment within Terraform modules as it is currently done with ArmoniK's example infrastructure, Helm charts allow us to focus only on passing the desired configuration for our MongoDB deployment. Nevertheless, it is possible that some Kubernetes resources or bash scripts will still be needed inside Terraform for specific configuration purposes. Helm charts also offer that capacity.

Some other alternatives to responding this issue were considered :
Some other alternatives to answer this problematic were considered :
- Adding some resources and modifying some others in our Terraform module. To respect good practices (KISS and "don't reinvent the wheel" principles), we found this solution unrelevant as others developers are able to provide us out-of-the-box MongoDB deployments, which is often a better way of building softwares as the complexity is shifted to the Helm chart definiton.
- Using Kubernetes operator for MongoDB. Unfortunately, no operator supporting sharding have been found.

Expand All @@ -82,40 +82,40 @@ To conclude, using a MongoDB Helm chart would allow ArmoniK to benefit from an o
## Overview
Until now ArmoniK's table storage was described with multiple Kubernetes resources and bash scripts. Using Helm charts for table storage will reduce the maintenance and improve the reusability of our example infrastructure by other organisations.

In the example infrastructure, Bitnami's Helm chart for MongoDB (sharded or not) will be used.
In the example infrastructure, [Bitnami's Helm chart](https://artifacthub.io/packages/helm/bitnami/mongodb-sharded) for MongoDB (sharded or not) will be used.

Here is what the Helm chart would handle indepedently with the chart used (sharded or not) :
- The creation of the Kubernetes resources needed to deploy MongoDB, including the persistent volumes and the underlying logic to use them (see [PersistentVolumeClaims documentation](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims))
- MongoDB initialization and launching

## WARNING
**As the developments and tests supporting this AEP were done using Bitnami's Helm charts, the following specifications apply to these charts and it is absolutely not guaranted that they are valid with other charts or images**
**As the developments and tests supporting this AEP were done using Bitnami's Helm charts and images for MongoDB, the following specifications apply to these charts and images and it is absolutely not guaranted that they are valid with other charts or images**

Thus, we strongly advise you to use Bitnami's Helm charts for MongoDB ([sharded](https://artifacthub.io/packages/helm/bitnami/mongodb-sharded), [unsharded](https://artifacthub.io/packages/helm/bitnami/mongodb)) as well as Bitnami's MongoDB images. Note that each chart has a specific image, an image for the sharded chart and another for the unsharded chart.
As Bitnami's images are not verified with Docker Scout, we advice to adapt Bitnami's mongodb-sharded image to make it compliant with your personnal or business security requirements.

As quick-deploy is an example of ArmoniK deployment meant to help understanding what ArmoniK needs to run correctly, we are not responsible for warrantying a full operationality in production environment, especially in a security-compliant point of view.

## Technical Specifications
**What would globally change**:
**The principal modifications will be**:
- The possibility to choose whether to deploy MongoDB sharded or not.
- The data pods would be deployed within a StatefulSet object in Kubernetes, which is more suited for stateful applications (for more information, check the [StatefulSet documentation](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)).
- Since Helm takes over the whole MongoDB initialization, the *configmap.tf* file inside the MongoDB module, which defines the local file resources containing the initialization scripts, would be removed.
- The possibility to configure the Helm chart used to deploy MongoDB, although it is adviced to use the default ones. WARNING : Eventhough the MongoDB is still configurable, it will be strongly adviced to use Bitnami's MongoDB images, or the deployment might produce errors or not work as expected.
- Currently, the possibility to define a custom persistent volume would be suspended, as Helm manages it for us by default, we haven't worked on this capacity yet.
- The possibility to configure the Helm charts used to deploy MongoDB, although it is adviced to use the default ones.
- TLS certificates would no longer be manually created inside the Terraform module.
- The capacity to configure a timeout proportional to the number of MongoDB instances to be deployed on infrastructure setup. Since deploying a replicated or sharded MongoDB deployement can take quite some time, it is an important parameter to consider, as the default 5 minutes timeout for deploying a MongoDB can be easily exceeded. The default value is 4 minutes per instance (replica or shard times replica for a sharded deployment )

**What would be specific to the classic unsharded MongoDB Helm chart ?**
- It would be possible to deploy an out-of-the-box MongoDB instance, whether with a replica set architecture, or with a simple standalone architecture, eventhough it is highly unrecommended in most cases.
- The deployment would be exposed with a headless Kubernetes service, which is more suited for stateful applications that don't require load-balancing, as it allows to reach the pods directly as they have a persistent and identifying DNS record within the Kubernetes cluster (see [headless services documentation](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services) in Kubernetes).
- As a result, the connection scheme to the service is starting with a `mongodb+srv://` scheme, which is able to perform dynamic DNS recognition of the pods (for more information, check out the [connection string documentation](https://www.mongodb.com/docs/manual/reference/connection-string/#srv-connection-format) in MongoDB).
- Since the server is now preferably accessed with SRV strings, the port output is still present for retrocompatibility concerns but hardcoded with a value of 27017 (MongoDB default port).

**What would be specific to the sharded MongoDB Helm chart ?**
- It would be possible to deploy an out-of-the-box sharded MongoDB instance, with a configurable, but rather static, number of shards. We are currently working on a way to implement autoscaling. The number of config servers replicas and mongos "replicas" would also be configurable.
- TLS seems to be no longer supported.
- The *database* namespace in MongoDB as well as non-root MongoDB users are created via a custom initialization script.
- TLS still has to be managed.
- The *database* namespace in MongoDB as well as non-root MongoDB users are created via a custom initialization script.
- ArmoniK Core would authenticate as MongoDB administrator as sharding a MongoDB collection requires an admin role (see [MongoDB documentation](https://www.mongodb.com/docs/manual/reference/command/shardCollection/#mongodb-dbcommand-dbcmd.shardCollection)).

## Inside Kubernetes
In our case, since the MongoDB cluster is deployed within a Kubernetes cluster thanks to a Helm chart, we need to describe the infrastructure Helm creates to manage the different components of the MongoDB cluster.
In our case, since the MongoDB cluster is deployed within a Kubernetes cluster thanks to a Helm chart, we need to describe the Kubernetes objects defined by Bitnami's chart to manage the different components of the MongoDB cluster.

Bitnami's mongodb-sharded Helm chart offers an out-of-the box infrastructure setup needed to host a sharded MongoDB instance and mainly creates the following Kubernetes objects :

Expand Down

0 comments on commit 87a91d5

Please sign in to comment.