Repo will be cleaned up soon.
This repository contains the implementation of AEGIS-Net in PyTorch.
AEGIS-Net is an indoor place recognitino network extended from our previous work CGiS-Net. It is a two-stage network that first learns a semantic encoder-decoder to extract semantic features from coloured point clouds, and then learns a feature embedding module to generate global descriptors for place recognition.
This implementation has been tested on Ubuntu 18.04, 20.04 and 22.04.
-
For Ubuntu 18.04 installation, please see the instructions from the official KP-Conv repository INSTALL.md.
-
For Ubuntu 20.04 and 22.04 installation, the procedure is basically the same except for different versions of packages are used.
-
Ubuntu 20.04: PyTorch 1.8.0, torchvision 0.9.0, CUDA 11.1, cuDNN 8.6.0
-
Ubuntu 22.04: PyTorch 1.13.0, torchvision 0.14.0, CUDA 11.7
-
The ScanNetPR dataset can be downloaded here
├── ScanNetPR
│ ├── scans # folder to hold all the data
│ │ ├── scene0000_00
│ │ │ ├── input_pcd_0mean
│ │ │ │ ├── scene0000_00_0_sub.ply # zero meaned point cloud file stored ad [x, y, z, r, g, b]
│ │ │ │ ├── ...
│ │ │ ├── pose
│ │ │ │ ├── 0.txt # pose corresponding to the point cloud
│ │ │ │ ├── ...
│ │ │ ├── scene0000_00.txt # scene information
│ │ ├── ...
│ ├── views/Tasks/Benchmark # stores all the data split file from ScanNet dataset
│ ├── VLAD_triplets # stores all the files necessary for generating training tuples
├── batch_limits.pkl # calibration file for KP-Conv
├── max_in_limits.pkl # calibration file for KP-Conv
├── neighbors_limits.pkl # calibration file for KP-Conv
└── other ScanNet related files ...
In the first stage we train the semantic encodes and decoder on a SLAM-Segmentation task, i.e. semantic segmentation on coloured point clouds within local coordinate system.
-
Change the
self.path
variable in thedatasets/ScannetSLAM.py
file to the path of complete ScanNet dataset. -
Run the following to train the semantic encoder and decoder.
python train_ScannetSLAM.py
The training usually takes a day. We also provide our pretrained endocer-decoder here if you want to skip the first training stage.
Please download the folder and put it in the results
directory. In the folder Log_2021-06-16_02-31-04
we provide the model trained on the complete ScanNet dataset WITHOUT colour. And in the folder Log_2021-06-16_02-42-30
we provide the model trained on the compltete ScanNet dataset WITH colour.
In the second stage, we train the feature embedding module to generate the global descriptors.
-
Change the
self.path
variable in thedatasets/ScannetTriple.py
file to the path of ScanNetPR dataset. -
Run the the training file as:
python feature_embedding_main.py --train
Train the model with different setting:
--num_feat
change the number of feature layers, default 3, choosing from [3, 1] for attention version, [1, 3, 5] for no attention version;--optimiser
change the optimiser, default Adam, choosing from [SGD, Adam];--loss
change the loss function, default lazy_quadruplet, choosing from [triplet, lazy_triplet, lazy_quadruplet];--no_att
set to use no attention version;--no_color
set to use point clouds without colour ;
Run the file with an additional --test flag on, perform evaluation with the --eval flag on:
python feature_embedding_main.py --test --evaluate --visualise
- Kernel Visualization: Use the script from KP-Conv repository, the kernel deformations can be displayed.
Our AEGIS-Net is compared to a traditional baseline using SIFT+BoW, and 5 deep learning based method NetVLAD, PointNetVLAD, MinkLoc3D, Indoor DH3D and CGiS-Net.
Model \ Average Recall Rate | Top-1 | Top-2 | Top-3 | Epochs/Time Trained |
---|---|---|---|---|
ACGiS-Net (default) | 65.09% | 74.26% | 79.06% | 20 epochs (4 days) |
ACGiS-Net (no attention) | 55.13% | 66.19% | 71.95% | 20 epochs (4 days) |
CGiS-Net (default) | 56.82% | 66.46% | 71.74% | 20 epochs (7 days) |
CGiS-Net (default) | 61.12% | 70.23% | 75.06% | 60 epochs (21 days) |
SIFT + BoW | 16.16% | 21.17% | 24.38% | - |
NetVLAD | 21.77% | 33.81% | 41.49% | - |
PointNetVLAD | 5.31% | 7.50% | 9.99% | - |
MinkLoc3D | 3.32% | 5.81% | 8.27% | - |
Indoor DH3D | 16.10% | 21.92% | 25.30% | - |
NOTE: ACGiS-Net (no attention) = CGiS-Net (3 feats, using feat 2, 4, 5)
In this project, we use parts of the official implementations of following works:
-
KP-FCNN (Semantic Encoder-Decoder)
-
PointNetVLAD-Pytorch (NetVLAD Layer)
-
Test on NAVER Indoor Localisation Dataset Link.
-
Test on other outdoor datasets (Oxford RobotCar Dataset etc.).
-
Explore attention module for better feature selection before constructing global descriptors.