This repository provides an official implementation for:
- Camera Pose Auto-Encoders (PAEs), accepted to ECCV 2022
- Iterative Absolute Pose Regression (iAPR), which extends our ECCV22 work, a new class of APRs, combining absolute pose regression and relative pose regression, without extra image or pose storage.
Camera Pose Auto-Encoders (PAEs) are multi-layer perceptrons (MLPs), trained via a Teacher-Student approach to encode camera poses, using Absolute Pose Regressors (APRs) as their teachers (Fig. 1). Once trained, PAEs can closely reproduce their teachers' performance, across outdoor and indoor environments and when learning from multi- and single- scene APR teachers with different architectures.
Fig. 1: Training PAEs
Below we provide instructions for running our code in order to train teacher APRs and student PAEs and for evaluating them. We also provide pre-trained models.
Once a PAE is trained, we can use it to as a means for extending pose regression with visual and spatial information at a minimal cost.
Iterative Absolute Pose Regression (iAPR) is a new class of APRs, which combines absolute pose regression and relative pose regression, without additional image or pose storage. Specifically, it applies a PAE-based RPR on the initial APR estimate for one or more iterations (Fig. 2). iAPR achieves a new state-of-the-art (SOTA) localization accuracy for APRs on the 7Scenes dataset, even when trained with only 30% of the data.
Fig. 2: Our proposed iAPR method, combining absolute pose regression with PAE-based relative pose regression.
As for PAEs, we provide instructions for training and testing our iAPR model.
In order to run this repository you will need:
- Python3 (tested with Python 3.7.7)
- PyTorch deep learning framework (tested with version 1.0.0)
- Download the Cambridge Landmarks dataset and the 7Scenes dataset
- You can also download pre-trained models to reproduce reported results (see below)
- For a quick set up you can run: pip install -r requirments.txt Note: All experiments reported in our paper were performed with an 8GB 1080 NVIDIA GeForce GTX GPU
Our code allows training and testing of single-scene and multi-scene APR teachers. Specifically, we use PoseNet with different CNN backbones as our single-scene APRs and MS-Transformer as our multi-scene APR.
For example, in order to train PoseNet with EfficientNet-B0 on the KingsCollege scene, run:
python main_train_test_apr.py posenet train models/backbones/efficient-net-b0.pth
<path to the CambridgeLandmarks dataset>
datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_train.csv
CambridgeLandmarks_config.json
In order to train with a different backbone, change the path to the backbone (third argument) and the value of 'backbone_type', under the 'posenet' dictionary, in the json configuraion file. We support MobileNet and ResNet50.
After training, you can test your trained model by running:
python main_train_test_apr.py posenet test models/backbones/efficient-net-b0.pth <path to the CambridgeLandmarks dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_test.csv
CambridgeLandmarks_config.json --checkpoint_path posenet_effnet_apr_kings_college.pth
In order to train and test MS-Transformer, please follow the instructions at our MS-Transformer repository
To train a single-scene PAE with a PoseNet Teacher (with an EfficientNet-B0 backbone), run the following command, using the same configuration used for the teacher APR:
python main_learn_pose_encoding.py posenet train models/backbones/efficient-net-b0.pth
<path to dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_train.csv
CambridgeLandmarks_config.json posenet_effnet_apr_kings_college.pth
You can then evaluate it and compare it to its teacher, by running:
python main_learn_pose_encoding.py posenet test models/backbones/efficient-net-b0.pth
<path to dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_test.csv CambridgeLandmarks_config.json posenet_effnet_apr_kings_college.pth
--encoder_checkpoint_path posenet_effnet_apr_kings_college.pth
Similarly, you can train a multi-scene PAE with MS-Transformer. For example, training on the 7Scenes dataset:
python main_learn_multiscene_pose_encoding.py
ems-transposenet
train
models/backbones/efficient-net-b0.pth
<path to dataset>
datasets/7Scenes/7scenes_all_scenes.csv
7scenes_config.json
ems_transposenet_7scenes_pretrained.pth
and then evaluate it by running:
python main_learn_multiscene_pose_encoding.py
ems-transposenet
test
/models/backbones/efficient-net-b0.pth
<path to dataset>
datasets/7Scenes/abs_7scenes_pose.csv_fire_test.csv
7scenes_config.json
ems_transposenet_7scenes_pretrained.pth
--encoder_checkpoint_path
mstransformer_7scenes_pose_encoder.pth
Model (Linked) | Description |
---|---|
APR models | |
PoseNet+MobileNet | Single-scene APR, KingsCollege scene |
PoseNet+ResNet50 | Single-scene APR, KingsCollege scene |
PoseNet+EfficientB0 | Single-scene APR, KingsCollege scene |
MS-Transformer | Multi-scene APR, CambridgeLandmarks dataset |
MS-Transformer | Multi-scene APR, 7Scenes dataset |
Camera Pose Auto-Encoders | |
Auto-Encoder for PoseNet+MobileNet | Auto-Encoder for a single-scene APR, KingsCollege scene |
Auto-Encoder for PoseNet+ResNet50 | Auto-Encoder for a single-scene APR, KingsCollege scene |
Auto-Encoder for PoseNet+EfficientB0 | Auto-Encoder for a single-scene APR, KingsCollege scene |
Auto-Encoder for MS-Transformer | Auto-Encoder for a multi-scene APR, CambridgeLandmarks dataset |
Auto-Encoder for MS-Transformer | Auto-Encoder for a multi-scene APR, 7Scenes dataset |
We propose a PAE-based RPR model (Fig. 3) to estimate the relative motion between an encoded pose and a query image.
Fig. 3: Our proposed PAE-based RPR architecture, for implementing iAPR.
In order to train our model, run:
python main_iapr.py train <path to dataset> 7scenes_training_pairs.csv
7scenes_iapr_config.json pretrained_models/ems_transposenet_7scenes_pretrained.pth
models/backbones/efficient-net-b0.pth pretrained_models/mstransformer_7scenes_pose_encoder.pth
Note: links to data are available below.
In order to test iAPR with our model, for example with the chess scene, run:
python main_iapr.py test <path to dataset> datasets/7Scenes/abs_7scenes_pose.csv_chess_test.csv
7scenes_iapr_config.json pretrained_models/ems_transposenet_7scenes_pretrained.pth
models/backbones/efficient-net-b0.pth pretrained_models/mstransformer_7scenes_pose_encoder.pth
--checkpoint_path <path to iapr model>
You can change the number of iterations in the configuration file. Pretrained models are available below.
our iAPR achieves SOTA performance on the 7Scenes dataset and improves performance even when trained on a much smaller subset of the training data.
The following table shows the pose error (in meters/degrees) of MS-Transformer and iAPR, for the 7Scenes dataset, when training with 100%, 70%, 50% and 30% of the train set:
% of training data | MS-Transformer | iAPR |
---|---|---|
100% | 0.18m / 7.28deg | 0.17m / 6.69deg |
70% | 0.19m / 7.41deg | 0.18m / 7.10deg |
50% | 0.19m / 7.73deg | 0.18m / 6.89deg |
30% | 0.20m / 8.19deg | 0.19m / 7.12deg |
Data (linked) | Description |
---|---|
7Scenes training 100p | 100% of the training images |
7Scenes training 70p | 70% of the training images |
7Scenes training 50p | 50% of the training images |
7Scenes training 30p | 30% of the training images |
7Scenes training pairs-100p | Training pairs generated from 100% of the training images |
7Scenes training pairs-70p | Training pairs generated from 70% of the training images |
7Scenes training pairs-50p | Training pairs generated from 50% of the training images |
7Scenes training pairs-30p | Training pairs generated from 30% of the training images |
Model (linked) | Description |
---|---|
iAPR-100p | iAPR Model trained with 100% of 7Scenes dataset |
PAE-100p | PAE Model trained with 100% of 7Scenes dataset (original MS-PAE model, available above) |
MS-100p | MS-Transformer Model trained with 100% of 7Scenes dataset (original MS-Transformer model) |
iAPR-70p | iAPR Model trained with 70% of 7Scenes dataset |
PAE-70p | PAE Model trained with 70% of 7Scenes dataset |
MS-70p | MS-Transformer Model trained with 70% of 7Scenes dataset |
iAPR-50p | iAPR Model trained with 50% of 7Scenes dataset |
PAE-50p | PAE Model trained with 50% of 7Scenes dataset |
MS-50p | MS-Transformer Model trained with 50% of 7Scenes dataset |
iAPR-30p | iAPR Model trained with 30% of 7Scenes dataset |
PAE-30p | PAE Model trained with 30% of 7Scenes dataset |
MS-30p | MS-Transformer Model trained with 30% of 7Scenes dataset |
To train an image decoder for a PAE for the ShopFacade scene:
python main_reconstruct_img.py train <path to cambridge dataset>
datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_ShopFacade_train.csv
reconstruct_config.json pretrained_models/mstransformer_cambridge_pose_encoder.pth
To test our Decoder (decoding the train images for their PAE encoded poses):
python main_reconstruct_img.py
demo <path to cambridge dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_ShopFacade_train.csv
reconstruct_config.json pretrained_models/mstransformer_cambridge_pose_encoder.pth
--decoder_checkpoint_path pretrained_models/img_decoder_shop_facade.pth
You can download the pre-trained ShopFacade decoder from here