AI-powered Human-Drone Pose Estimation Aboard Ultra-low Power Autonomous Flying Nano-UAVs.
PULP-Frontnet is the product of collaboration by a team of amazing people spanning four academic institutions: Nicky Zimmerman2, Elia Cereda1, Alessio Burrello3, Francesco Conti3, Hanna Müller4, Alessandro Giusti1, Jérôme Guzzi1, and Daniele Palossi1,4.
1 Dalle Molle Institute for Artificial Intelligence (IDSIA), USI and SUPSI, Switzerland.
2 Institute of Geodesy and Geoinformation (IGG) of University of Bonn, Germany.
3 Department of Electrical, Electronic and Information Engineering (DEI) of University of Bologna, Italy.
4 Integrated Systems Laboratory (IIS) of ETH Zürich, Switzerland.
If you use PULP-Frontnet in an academic context, we kindly ask you to cite the following publication:
- D. Palossi et al., ‘Fully Onboard AI-powered Human-Drone Pose Estimation on Ultra-low Power Autonomous Flying Nano-UAVs’, IEEE Internet of Things Journal, 2021 — arXiv preprint, IEEE IoT Journal.
@article{palossi2021,
author = {Palossi, Daniele and Zimmerman, Nicky and Burrello, Alessio and Conti, Francesco and Müller, Hanna and Gambardella, Luca Maria and Benini, Luca and Giusti, Alessandro and Guzzi, Jérôme},
title = {Fully Onboard {AI}-powered Human-Drone Pose Estimation on Ultra-low Power Autonomous Flying Nano-{UAVs}},
journal = {{IEEE} Internet of Things Journal},
issn = {2327-4662},
doi = {10.1109/JIOT.2021.3091643},
date = {2021},
keywords = {Aerospace electronics, Aircraft, Computational modeling, Drones, Internet of Things, Robots, Task analysis},
}
This work has been partially funded by the Swiss National Science Foundation (SNSF) Spark (grant no. 190880), by the Swiss National Centre of Competence in Research (NCCR) Robotics, and by the EU H2020 projects 1-SWARM and ALOHA (grant no. 871743 and 780788).
YouTube video
- Ubuntu 20.04
- NVIDIA CUDA 11.1
- Python 3.7.10
- GAP SDK 3.9.1
- Bitcraze Crazyflie v2.1 quadrotor
- Bitcraze AI-deck v1.1
- Bitcraze Flow-deck v2
- Olimex ARM-USB-OCD-H JTAG debugger + ARM-JTAG-20-10 adapter board
The first step to setup PULP-Frontnet on your computer is to clone the source code, initialize the submodules and install the required Python dependencies:
$ git clone https://github.com/idsia-robotics/pulp-frontnet.git
$ cd pulp-frontnet
$ git submodule update --init --recursive
$ pip install -r requirements.txt
Then, the datasets used for the experiments should be downloaded. Due to their large size, the training and test datasets are not stored inside the Git repo. They can be downloaded and extracted using the following commands:
$ cd PyTorch
$ curl https://drive.switch.ch/index.php/s/FMQOLsBlbLmZWxm/download -o pulp-frontnet-data.zip
$ unzip pulp-frontnet-data.zip
$ rm pulp-frontnet-data.zip
To ensure the files have been downloaded correctly, check their SHA256 checksums against those stored in the repo:
$ sha256sum -c Data/checksums.txt
The process to produce a deployable PULP-Frontnet model is composed of several steps:
- First, we train a full-precision floating-point model using native PyTorch code.
- In the second step, we use the NEMO open-source library to create a fake-quantized copy of the network and fine tune it, so that the decline in regression performance due to quantization is minimized.
- After fine tuning, we use NEMO again to transform the network to integer deployable form, in which all tensors (weights, inputs, outputs, and intermediate activation) are stored as integers and inference is entirely performed with integer arithmetic operations.
Between each step, the model is evaluated against the test set, to ensure that regression performance doesn't drop due to a mistake. More details about the quantization process can be found in the PULP-Frontnet paper and in the NEMO technical report.
One of the objectives of PULP-Frontnet is to explore the relationship between a model's regression performance and its memory and computational requirements. For this reason, we compare three model architecture variants, which differ along two orthogonal dimensions: the size of input images fed to the network and the number of output channels of the first convolutional layers.
The desired variant can be selected by supplying one of these values to the {variant}
parameter of the following scripts:
160x32
: 160×96 px input images, the first convolutional layer has 32 output channels (max. memory and computational requirements);160x16
: 160×96 px input images, the first convolutional layer has 16 output channels (min. memory requirements);80x32
: 80×48 px input images, the first convolutional layer has 32 output channels (min. computational requirements).
Already-trained model checkpoints for each variant are supplied in the PyTorch/Models/
sub-directory of this repo, named as follows:
Frontnet{variant}.pt
: full-precision model checkpoints;Frontnet{variant}.Q.pt
: fake-quantized model checkpoints.
These checkpoints can be used to experiment with PULP-Frontnet without repeating the entire training process.
The datasets used in the experiments, consisting in a training set and a test set,
are stored in the PyTorch/Data/
sub-directory of this repo.
For performance and reproducibility reasons, data augmentation and image downscaling has been
performed offline as a pre-processing step.
The resulting datasets, ready to be used for training, have been publicly released.
For this reason, four files can be found in PyTorch/Data/
:
160x96OthersTrainsetAug.pickle
and160x96StrangersTestset.pickle
with 160×96 px input images, used for the160x32
and160x16
model variants;80x48OthersTrainsetAug.pickle
and80x48StrangersTestset.pickle
with 80×48 px input images, used for the80x32
model variant.
The two pairs of datasets are identical except for the different image scale.
Five scripts implement the training and quantization pipeline:
FPTraining.py
, FPTesting.py
, QTraining.py
, QTesting.py
, and QExport.py
.
Their default configuration is to replicate the exact setup used in the paper.
Only one command-line parameter must always be supplied to specify the desired
model variant to be trained.
Additional optional arguments are available to further customize the training
process, as displayed by the --help
command of each script.
All five commands must be run from the PyTorch/
subdirectory of this repo:
$ cd PyTorch
Train a full-precision floating-point neural network using ordinary PyTorch code:
$ python Scripts/FPTraining.py {variant}
This will train the model for 100 epochs and save the model weights with lowest validation
loss to Results/{variant}/Frontnet.pt
.
Already-trained full-precision models can be found in Models/Frontnet{variant}.pt
.
You can save the trained model to a custom location with the --save-model PATH
flag.
Measure the regression performance of the full-precision model against the test set:
$ python Scripts/FPTesting.py {variant}
A different full-precision model can be evaluated by supplying the --load-model PATH
flag.
Evaluation is performed by computing three regression performance metrics for
each component:
Mean Absolute Error (MAE),
Mean Squared Error (MSE),
and R2 Score.
Create a fake-quantized copy of the network and fine tune it on the training set:
$ python Scripts/QTraining.py {variant}
Fake-quantized models are still floating-point, but they introduce differentiable quantization operations on weights and intermediate activations. This results in a model that performs similarly to a fully quantized one but is still trainable, making it possible to fine-tune its weights so that the performance drop caused by quantization is minimal. Refer to Section 2 of the NEMO technical report for details.
By default, QTraining.py
will load the full-precision model Results/{variant}/Frontnet.pt
,
convert it to fake-quantized form and fine tune it for 10 epochs. A different
full-precision model can be loaded using the --load-model PATH
flag.
The tuned model weights are then saved as Results/{variant}/Frontnet.Q.pt
.
As before, the trained model can be saved to a custom location with --save-model PATH
.
Already-trained fake-quantized models can be found in Models/Frontnet{variant}.Q.pt
.
Measure the regression performance of the fake-quantized model against the test set:
$ python Scripts/QTesting.py {variant}
A different fake-quantized model can be evaluated by supplying the --load-model PATH
flag.
Transform the network to the integer-deployable form and export it to the format expected by DORY:
$ python Scripts/QExport.py {variant}
Refer to Section 3 of the NEMO technical report for details about this step.
By default, QExport.py
will load the fake-quantized model Results/{variant}/Frontnet.Q.pt
,
a different fake-quantized model can be loaded by supplying the --load-model PATH
flag.
The output files for the deployable model will be saved in Results/{variant}/Export
,
ready to be used with DORY to generate the C code for inference on the GAP8 SoC.
In particular, NEMO exports an ONNX file with network architecture and weights
and a set of text files containing expected activation values on one example input image.
DORY computes checksums of the expected activations and uses them to perform a runtime sanity check, ensuring that the activations produced by the generated code match the expected ones.
[coming soon]
[coming soon]
[coming soon]
[coming soon]