Contents:
MLOps is an emerging field of ML research that aims to enable and automate ML models into production. According to sig-mlops MLOps is defined as:
the extension of the DevOps methodology to include Machine Learning and Data Science assets as first class citizens within the DevOps ecology
In this repository we won't discuss benfites and limitations of MLOps, but we provide some references for those who are interested in using MLOps.
- Very detailed tutorial in MLOps
- The difference between DevOps and MLOps
- AutoML organization and tools
Note: AutoML is a technology that targets non-expert ML practitioners to build and deploy ML models. It can be used in conjuction with MLOps. However, it is fairly in early stages and we're not going to discuss it here.
Based on our research and the requirements of the project, we decided to use the following pipeline:
- Model and dataset versioning: As ML-base software is fundamentally different from traditional software, model and dataset versioning is an issue and cannot be handled just by using git (as the amount of data is too large).
- Automatic model training: We will autmoate the training of a face detection/recognition model.
- Automatic build: The process of packaging will be automated (creation of Docker images, building of Docker containers, etc.).
- Automatic deployment: The Docker images will be deployed to a local server automatically.
- Model monitoring: We will provide simple logging and monitoring tools to monitor the performance of the model.
- Metadata gathering: During whole pipeline execution, some metadata will be gathered and stored in the database.
- Triggering mechanisms: The pipeline execution triggering mechanism will be based on pipeline change and manual triggering.
- Choosing edge devices: Learning about various edge devices, their limits, and the various models that can be used with them.
- Testing datasets: Examining and evaluating a few datasets that have been processed by edge devices.
A discussion of currently available tools for each stage of the pipeline is provided below.
As mentioned briefly above, ML-base software is different from traditional software in that it is not enough to only have code, but also one need whole dataset to produce the exact model. Plus, the explicit relationship between input and output is not known. So, it requires special attention to versioning.
Git is widely used in versioning and source control for traditional software. However, it is not suitable for ML-base software. The dataset is too large and it is not feasible to index it in git. Models are binary and switching between different versions of the model is not easy. There are other reasons that git alone is not suitable for ML-base software. You can refer to this for more information.
Tools for versioning ML-base software:
- DVC: An open-source git-based version control system for ML projects. It is by far the most popular version control system in the wild.
- dotmesh: According to dotmesh
dotmesh (dm) is like git for your data volumes (databases, files etc) in Docker and Kubernetes
dotmesh
doesn't have an active community and the latest release was in 2020. So, DVC
is the best and pretty the only solution for version control. Some important features of DVC
are (according to DVC features):
- Git-compatible
- Storage agnostic
- Reproducible
- Low friction branching
- Metric tracking
- ML pipeline framework
- Language and framework agnostic
- Track failures
So, we are considering to use DVC as a version and CI/CD app.
Tools for CI/CD
The other option for us is jenkins. This open-source software can provide us with a pipeline that could be run right after the version-control app but it does not provide any version-control itself unlike Gitlab and DVC.
Other option is docker compose but we have the same problem as jenkins. It does not provide any version-control for us.
We are considering the choice between DVC and gitlab as both of these tools are very useful in our case .
We are focusing on developing MLOps techniques for edge devices. Edge devices are quite versatile and designed by different manufacturers. So, we need to have a tool that is compatible with different edge devices and to be device-agnostic. Another important feature is to be framework-agnostic. In other words, we should be able to use all models that are trained with different frameworks, without any modification. To tackle this two issues, we used the following pipeline:
ONNX
standard helps us to be framework agnostic. Almost all training frameworks support ONNX and one can convert the final model to a .onnx
format and later use it in inference frameworks that support this format (such as OpenVINO
and ONNXRuntime
). For inference side, we are going to use ONNXRuntime
. It is a cross-platform inference engine that supports multiple frameworks and hardware accelerators. So, it's a great choice for edge devices.
Plus we are using Docker
to package our application and it's dependencies. It also helps us to create a CI/CD pipeline which is essential for MLOps. As Docker
it self could be inefficient for edge devices, we are using balenaOS
which is a lightweight OS, tailored for each hardware with capabilities to run Docker
containers. Under the hood, balenaOS
uses yocto
to build the image file. As of writing this doc, it supports more than 80 devices. More information about balenaOS
can be found here.
For the model in this project, we decided to use the IMDB movie reviews dataset. This dataset contains reviwes from users for movies which are labeled either a positive review or a negative review. The format of each data in this dataset is an array of numbers which represent a word in this dataset's dictionary.
For example the word "film" is indexed as integer 13 in this dataset.
An example:
print(data[0])
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
If we translate the entry above we will get this:
# this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert # is an amazing actor and now the same being director # father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for # and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also # to the two little boy's that played the # of norman and paul they were just brilliant children are often left out of the # list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all
The "#" characters are the ones that are not available in model's dictionary.
The directory saved_model
contains saved model from tensorflow.
The directory convert_model
contains the onnx model.
To get the onnx output use the command below:
$>python -m tf2onnx.convert --saved-model ./saved_model/ --opset 12 --output ./convert_model/output.onnx
We have cross compiled ORT for armv7 architecture and tested it on Raspberry Pi 400. First, clone onnxruntime repository and a custom protoc version for cross compiling (refer to ORT documentation for more details). You can either follow these steps to compile ORT manually or use the Dockerfile provided in this repository. First, manual steps are explained and then using Docker is introduced.
I have used the following tool.cmake
file for cross compiling:
SET(CMAKE_SYSTEM_NAME Linux)
SET(CMAKE_SYSTEM_VERSION 1)
SET(CMAKE_SYSTEM_PROCESSOR armv7-a)
SET(CMAKE_SYSROOT <path to sysroot>)
SET(CMAKE_C_COMPILER arm-none-linux-gnueabihf-gcc)
SET(CMAKE_CXX_COMPILER arm-none-linux-gnueabihf-g++)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-psabi")
SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
SET(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
Make sure that ARM toolchain is accessible from PATH (otherwise provide absolute path). You might want to transfer linker and some exectuables to your /usr
path (such as ld-linux-armhf.so.3
to /usr/arm-linux-gnueabihf/
).
I compiled v1.12.1 of ORT. It seems that v1.13 has some issues with cmake. So, make sure to use v1.12.1:
$ git checkout v.1.12.1
Next, run the following command to cross compile ORT:
./build.sh --config Release --parallel --arm --update --build --build_shared_lib --cmake_extra_defines ONNX_CUSTOM_PROTOC_EXECUTABLE=<path to bin/protoc> CMAKE_TOOLCHAIN_FILE=<path to tool.cmake>
After waiting a long time, dynamic and static libraries will be generated. You can find them in build/Linux/Release/
directory. Set this path in CMakeLists.txt
to compile this project. Also set include directory path in CMakeLists.txt
and change TC-arm.cmake
accordingly. Finally, build the project (from build
folder):
$ cmake -DCMAKE_TOOLCHAIN_FILE=<path to TC-arm.cmake> -DCMAKE_INSTALL_PREFIX=<install prefix> ..
$ make
$ make install
And you're all set!
You can use the Dockerfile provided in this repository to build a Docker image that will cross compile your project with ORT libraries. The final image can be used either manually or as a base image for your project. Image is built on Ubuntu:22.04
base image. The final image is around 4GB. You can build the image with the following command (make sure that you are in this repository's root directory):
$ docker build . -t edgemlops:1.0.0
Image building process could take about two hours (depending on your machine and internet speed). After building the image, ORT libraries are in /ORT/onnxruntime/
directory.
In the mqtt
folder, there are 2 programs. One for the host machine that manages the devices that are connected and are supposed to run the model and one for the clients on edge. These programs need a broker to be able to communicate with each other. To do so you can use a broker such as mosquitto to setup your own broker.
To use these programs, you need to compile them using the Eclipse Paho MQTT C library.
https://github.com/eclipse/paho.mqtt.c
Make sure to edit the CMakeLists file to build the static libraries as well.
To connect to the program, simply enter the IP of the broker as an argument:
$ ./hostManager "192.168.1.110"
The program for the host manager must be in the same folder as other folders such as scripts and inference. To make the inference program, you need to already have the docker image in order for program to use it.
The scripts folder is a simple implementation of the operations that we want to use as a host such as moving files to the edge device, instructions for compiling the inference program and other similar scripts.
When it comes to choosing the right edge device, It's important to consider our specific use case. There are several options available in the market, but two popular choices are Raspberry Pi and Jetson. In the following, we'll provide a comparative analysis of these devices.
The Raspberry Pi is a popular choice for edge computing due to its low cost and versatility. It is a credit-card sized computer that can run various operating systems, including Linux and Windows.
Running a ML program in a Raspberry Pi requires a significant amount of memory (or RAM) to process calculations. The lastest and preferred model for ML applications is the Raspberry Pi 4 Model B.
Typical ML projects for the Raspberry Pi involve classifying items, including different visual, vocal, or statistical patterns. The backbone of all ML models is a software library and its dependencies. There are currently a variety of free ML frameworks. Some of most well-known platforms include the following:
- TensorFlow: A flexible platform for building general ML models.
- OpenCV: A library dedicated to computer vision and related object detection tasks.
- Google Assistant: A library dedicated to voice recognition tasks.
- Edge Impulse: A cloud-based platform that simplifies ML app development.
Raspberry Pi can be used to train and run ML models for image classification. For example, you can use TensorFlow to train a model on a dataset of images and then use it on a Raspberry Pi to classify new images in real-time.
Here is some tutorial on building a real-time object recognition on Raspberry Pi using TensorFlow and OpenCV:
Other varied uses, such as voice recognition and anomaly detection, are covered in the tutorials and examples here:
For more information about Raspberry Pi click here
Jetson is a line of embedded systems designed by NVIDIA specifically for edge computing applications. Jetson devices are equipped with a powerful GPU, which makes them ideal for tasks such as image and video processing, machine learning, and deep learning. Jetson devices are more expensive than Raspberry Pi, but they offer better performance and capabilities for demanding edge computing tasks.
As was the case with the Raspberry Pi, ML applications require a sizable amount of memory (or RAM), therefore the Jetson Nano is the device that is most commonly used for ML applications.
Several ML frameworks are compatible with Jetson, just like Raspberry Pi. In addition to the frameworks listed in the Raspberry Pi section, Jetson also supports PyTorch. PyTorch is known for its ease of use and flexibility, and is widely used in computer vision and natural language processing applications.
Like the Raspberry Pi, the Jetson may be utilized in several ML models. Models for object detection, facial recognition, audio recognition, natural language processing, and several more applications are just a few examples.
Here are a few Jetson ML model examples and tutorials:
For more information about Jetson Nano click here
When building ML models on resource-limited devices such as the Raspberry Pi or Jetson Nano, one of the main challenges that can arise is a lack of available RAM. ML models often require a significant amount of memory to operate, and if there isn't enough RAM available, the models may not be able to run properly or may even crash.
There are several strategies that can be employed to mitigate RAM problems when building ML models on these devices. One approach is to use a smaller model architecture that requires less memory. This can be achieved by reducing the number of layers or neurons in the model.
Another strategy is to reduce the batch size used during training. By using a smaller batch size, less memory is required to store the intermediate activations of the model during training. However, this can also result in longer training times and reduced training accuracy.
One possible solution is to use a swapfile. A swapfile is a file on the system's hard drive that is used as virtual memory when the system runs out of physical RAM. When the system needs more memory than what is available in RAM, it swaps out the least-used memory pages to the swapfile, freeing up space in RAM for more important processes. However, it's important to note that using a swapfile can slow down the system's performance, as accessing the hard drive is slower than accessing RAM. Therefore, it's recommended to use a swapfile only as a temporary solution when running memory-intensive processes on these devices.
In this discussion, we'll look at some of the interesting datasets that have been and may be analyzed using devices we've talked about, as well as the conclusions drawn from such investigations.
CIFAR (Canadian Institute for Advanced Research) is a collection of datasets that are commonly used for image recognition. The most popular dataset is CIFAR-10, which consist of 60,000 32*32 color imaages in 10 classes, with 6,000 images per class.
The CIFAR-10 dataset can be downloaded from the official website here.
The CIFAR-10 dataset can be used on a Raspberry Pi for various image recognition tasks, such as object recognition and image classification. The small size of images in this dataset makes it easy to work with.
Numerous studies have been conducted in this direction, and one excellent thorough study with time and memory usage results is available here.
MNIST dataset is a classic daataset of handwritten digits, often used as a benchmark for image classification tasks. It consists of 70,000 grayscale images of size 28*28 pixels, with each image representing a single digit from 0 to 9. The dataset is split into 60,000 training images and 10,000 test images.
The MNIST dataset can be downloaded from the official website here.
Several repositories have used this dataset in some interesting ways. This dataset has a TensorFlowLite version that utilizes a camera, and the setup procedures are available here.
A collection of short audio clips, each containing a spoken command. The dataset is often used for speech recognition tasks, where the goal is to identify the spoken command from the audio clip. The dataset contains of different spoken command such as "yes", "no", "up", "down" and "stop".
This dataset can be downloaded from here.
There is a source code for this dataest using TensorFlow for classification and data processing here.
The UrbanSound8K dataset is a popular dataset used for sound classification tasks. It consists of 8732 labeled sound clips, each of which is 4 seconds long, and is classified into 10 classes of urban sounds. The 10 classes of urban sounds in the dataset are: air conditioner, car horn, children playing, dog bark, drilling, engine idling, gun shot, jackhammer, siren and street music.
Visit this page for additional details about this dataset and to download it.
One instance of classification of this dataset can be found here. Beside classification there is a guide to create a docker image in this repository.
This dataset contains 50,000 of movie reviews, split into 25,000 reviews for training and 25,000 reviews for testing. Each review is labeled as either positive or negative, based on its overall sentiment. The dataset is often used for sentiment analysis, building recommendation system and product resesarch, as it provides valuable insights into customer opinions and preferences.
To learn more about this dataset and to download it, go visit this website.
This dataset contains over 100,000 question-answer pairs based on Wikipedia articles. The dataset is designed to test the ability of machine learning models to answer human-generated questions by providing a large corpus of text and a set of associated questions. This dataset can be used to train question-answering models and build chatbots.
This dataset can be downloaded from here.
One instance of question answering of this dataset can be found here.
We provide documentations on common "How to" questions. You refer to one of the following docs for more information: