Skip to content

MixVPR: Feature Mixing for Visual Place Recognition (WACV 2023)

Notifications You must be signed in to change notification settings

amaralibey/MixVPR

Repository files navigation

MixVPR: Feature Mixing for Visual Place Recognition

PWC PWC PWC PWC PWC PWC

This is the official repo for WACV 2023 paper "MixVPR: Feature Mixing for Visual Place Recognition"

Summary

This paper introduces MixVPR, a novel all-MLP feature aggregation method that addresses the challenges of large-scale Visual Place Recognition, while remaining practical for real-world scenarios with strict latency requirements. The technique leverages feature maps from pre-trained backbones as a set of global features, and integrates a global relationship between them through a cascade of feature mixing, eliminating the need for local or pyramidal aggregation. MixVPR achieves new state-of-the-art performance on multiple large-scale benchmarks, while being significantly more efficient in terms of latency and parameter count compared to existing methods.

[WACV OpenAccess] [ArXiv]

architecture

Trained models

All models have been trained on GSV-Cities dataset (https://github.com/amaralibey/gsv-cities).

performance

Weights

Backbone Output
dimension
Pitts250k-test Pitts30k-test MSLS-val DOWNLOAD
R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10
ResNet50 4096 94.3 98.2 98.9 91.6 95.5 96.4 88.2 93.1 94.3 LINK
ResNet50 512 93.2 97.9 98.6 90.7 95.5 96.3 84.1 91.8 93.7 LINK
ResNet50 128 88.7 95.8 97.4 87.8 94.3 95.7 78.5 88.2 90.4 LINK

Code to load the pretrained weights is as follows:

from main import VPRModel

# Note that images must be resized to 320x320
model = VPRModel(backbone_arch='resnet50', 
                 layers_to_crop=[4],
                 agg_arch='MixVPR',
                 agg_config={'in_channels' : 1024,
                             'in_h' : 20,
                             'in_w' : 20,
                             'out_channels' : 1024,
                             'mix_depth' : 4,
                             'mlp_ratio' : 1,
                             'out_rows' : 4},
                )

state_dict = torch.load('./LOGS/resnet50_MixVPR_4096_channels(1024)_rows(4).ckpt')
model.load_state_dict(state_dict)
model.eval()

Bibtex

@inproceedings{ali2023mixvpr,
  title={{MixVPR}: Feature Mixing for Visual Place Recognition},
  author={Ali-bey, Amar and Chaib-draa, Brahim and Gigu{\`e}re, Philippe},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2998--3007},
  year={2023}
}