The current project aims to deploy a churn detection machine learning model (repo here) into a GCP production environment, using some cloud technologies such as Docker and Kubernetes.
Table of Contents
- Getting started
- Simplified deployment
- Deployment on Google Kubernetes Engine
- Testing
- CI/CD
- Local postgres db (test purpose)
- API config files
- Project's Arborescence
poetry config --local virtualenvs.in-project true
poetry install
Create a configuration file config.yml
in sub folder infrastructure/config
Copy/paste the content of the section access to gcloud SQL fill the correct password into config.yml
source .venv/bin/activate
gcloud auth application-default login # give access to GCS to find the pkl model
make run-server
Try the API at adress: http://0.0.0.0:8000/docs
NB: only the detect route will work if there is no sql base running locally. To try the other routes ('customer' routes), you can either:
- use a docker-compose deployment like specified in Docker section build and run
- use the real SQL database through a proxy. Use
make proxy start
command, more help can be found here proxy SQL connexion
The following instruction covers a simple deployement without load balancing and without sql database support. Just the fastpi app running in a container connected to a google cloud storage.
- You need to give access to docker to your github private key setting the environment variable SSH_PRIVATE_KEY
- Make sure your GOOGLE_APPLICATION_CREDENTIALS is set before runing image.
- Fill a config.yml file, see API config files
make build-docker-simple
make run-app-simple
The API doc is at adress: http://0.0.0.0:8000/docs You can try the detect route.
The churn api is deployed on a kubernetes cluster.
The service in deployed in 2 replicas, with one load-balancer and 2 containers (one for the api and one for the proxy allowing access to the sql database).
The deployment will have also access to a Google Cloud Storage bucket where the pickle model is stored, and a Google cloud SQL database to access to the customer datas.
kubectl create secret generic chaos-secrets-1 \
--from-file=key.json=<path to json of the service account private key> \
--from-file=<path to the config.yml>
If you want to try a full functional api locally, you need to contenerize the api and a postgres db in a docker-compose.
Here are the steps you can follow for that purpose.
Find also details about the local contenerize postgres db setup here : Local postgres db
Make sure your GOOGLE_APPLICATION_CREDENTIALS is set before runing image.
export GOOGLE_APPLICATION_CREDENTIALS=<path to json of the service account private key>
export SSH_PRIVATE_KEY=<path_to_shh_key> # Gitlab ssh key needed to import churn repo
make containerize-and-start-app
This command will build an image of your application, the tag of the image will be the short git sha1. It will create a local postgres sql bdd and the app will request on it. Don't forget to add the csv data if this is the first time
The following command will enable you to build all the required environment to run unit tests, and functional tests. Functional tests are very important because they enable you to try your application working on real elements. (Real bdd, real model etc) In order to preserve production bdd performance, we build a local database (Postgres SQL) with docker. So don't forget to add the csv data!!
export GOOGLE_APPLICATION_CREDENTIALS=<path to json of the service account private key>
export SSH_PRIVATE_KEY=<path_to_shh_key> # Gitlab ssh key needed to import churn repo
make containerize-and-run-tests
Exactly the same than build and run, but it don't run.
export GOOGLE_APPLICATION_CREDENTIALS=<path to json of the service account private key>
export SSH_PRIVATE_KEY=<path_to_shh_key> # Gitlab ssh key needed to import churn repo
make build-docker-image
If you want to push your generated image directly to Google Container Registry without working with CI-CD, this is possible. Simply do :
export SHORT_SHA=$(git rev-parse --short=8 HEAD)
docker push eu.gcr.io/coyotta-2022/chaos-1:$SHORT_SHA
make coverage-unit
Then check the coverage of the unit tests in coverage/coverage.txt.
NB: the functional tests improve this coverage.
make run-perf-tests
This test check that the final api performance is equal to the expected f1 score.
Check the section run all tests
Steps | CI/CD jobs | Trigger |
---|---|---|
push feature branch | - unit tests | Developper |
MR into develop branch | - unit tests - build docker |
Developper |
Push develop branch | - unit tests - build docker image & push it to registry |
Gitlab on MR success |
MR into main branch | - functionnal tests | Developper |
Push main branch | - build & push image - deployment |
Gitlab on MR success |
CI/CD jobs | Triggers |
---|---|
unit tests | - push feature branch - MR into develop branch - push develop branch |
build docker | - MR into develop branch - push develop branch - push main branch |
push image to registry | - push develop branch - push main branch |
functional tests | MR into main branch |
deployment | - push main branch |
- BASE64_GOOGLE_CREDENTIALS: (unused) base64 service account
- CONFIG_YML: yaml file with many config variables for Gitlab environment
- GCP_SA_KEY: GCP service account private key (used to access gcp registry and kubernetes)
- SSH_CHURN_ACCESS: private ssh key for churn model gitlab repo
In order to not affect production data, in local environments we will work with a postgres container emulating bdd. We need to build and launch the containers, and to add data to it.
First don't forget previous export :
export GOOGLE_APPLICATION_CREDENTIALS=./proxy/gcp_key.json
And then launch bdd container only !!
make containerize-and-start-bdd
If this is the first time you create locally this bdd, you need to insert csv data to play with. In a new shell,launch the following code : (PS : if you want to give custom test_sample_customer and test_sample_indicators filepath, you can use the "-c" and "-i" options of this util.)
python3 utils/postgres_manager.py
After this operation don't forget to kill the previous shell with db running, and then you can build and launch your app, or launch unit and functional tests. See the section Docker : build and run, or run all tests.
\?
: get help\l
: list databases\dp
: list tables
Here are 2 examples of config files content to copy/paste in your config.yml
depending on your use case :
- First one (access to Gcloud SQL) is used for Kubernetes deployment (in the secrets) to run the API locally using the Gcloud resources (GCS and SQL)
- Second one is used for the docker-compose deployment (Functional tests)
postgresql:
username: coyotta-2022-group-1
password: <xxxx>
hostname: 127.0.0.1
port: 5432
database: churnapi
api:
port: 8000
host: 0.0.0.0
gcs:
bucket: "chaos-1"
blob: "model/ChurnModelFinal.pkl"
postgresql: #Those configs are used when a docker container want to communicate with an other container.
username: postgres
password: postgres
hostname: db
port: 5432
database: churnapi
external_postgres: #Those configs are used when your code (executed on your laptop) want to communicate with your db.
username: postgres
password: postgres
hostname: 127.0.0.1
port: 5442
database: churnapi
api:
port: 8000
host: 0.0.0.0
gcs: # your gcp storage configuration
project: "churn"
bucket: "churn_bucket"
blob: "model/ChurnModelFinal.pkl"
server:
historicize: false
.
├── chaos
│ ├── application
│ │ └── server.py
│ ├── domain
│ │ └── customer.py
│ ├── infrastructure
│ │ ├── config
│ │ │ └── config.py
│ │ ├── connexion.py
│ │ └── customer_loader.py
│ └── test
│ ├── conftest.py
│ ├── data
│ │ ├── test_sample_customer.csv
│ │ └── test_sample_indicators.csv
│ ├── functional
│ │ ├── test_bdd.py
│ │ └── test_whole_api.py
│ └── unit
│ ├── test_customer.py
│ └── test_unit_server.py
├── coverage
│ └── coverage.txt
├── deployment
│ ├── deployment.yml
│ └── load_balancer.yml
├── docker-compose.yml
├── Dockerfile
├── docs
│ ├── _build
│ ├── conf.py
│ ├── index.rst
│ ├── make.bat
│ ├── Makefile
│ ├── _static
│ └── _templates
├── images
│ └── churn.png
├── Makefile
├── poetry.lock
├── proxy
├── pyproject.toml
├── README.md
├── setup.py
└── utils
└── postgres_manager.py