We introduce Displacement Aware Relation Module(DisARM), a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes. The core idea is extracting the most principal contextual information is critical for detection while the target is incomplete or featureless. We find that relations between proposals provide a good representation to describe the context. However, adopting relations between all the object or patch proposals for detection is inefficient, and an imbalanced combination of local and global relations brings extra noise that could mislead the training. Rather than working with all relations, we find that training with relations only between the most representative ones, or anchors, can significantly boost the detection performance. Good anchors should be semantic-aware with no ambiguity and able to describe the whole layout of a scene with no redundancy. To find the anchors, we first perform a preliminary relation anchor module with an objectness-aware sampling approach and then devise a displacement based module for weighing the relation importance for better utilization of contextual information. This light-weight relation module leads to significantly higher accuracy of object instance detection when being plugged into the state-of-the-art detectors. Evaluations on the public benchmarks of real-world scenes show that our method achieves the state-of-the-art performance on both SUN RGB-D and ScanNet V2.
This repo is the official implementation of "DisARM: Displacement Aware Relation Module for 3D Detection".
Authors: Yao Duan, Chenyang Zhu, Yuqing Lan, Renjiao Yi, Xinwang Liu, Kai Xu*.
In this repository, we provide model implementation (with MMDetection3D V0.17.1) as well as training scripts on ScanNet and SUN RGB-D.
Note:
We also will fork the MMDetection3D
project and merge the DisARM module to the master branch. If you want to follow the newest version, please look forward to the offical repository of MMDetection3D in the coming weeks.
Method | [email protected] | [email protected] |
---|---|---|
VoteNet | 58.6 | 33.5 |
VoteNet+DisARM | 66.1 | 49.7 |
BRNet | 66.1 | 50.9 |
BRNet+DisARM | 66.7 | 42.3 |
H3DNet* | 66.4 | 48.0 |
H3DNet*+DisARM | 66.8 | 48.8 |
GroupFree3D*(L6,0256) | 66.3 | 47.8 |
GroupFree3D*(L6,0256)+DisARM | 67.0 | 50.7 |
GroupFree3D*(L12,0256) | 66.6 | 48.2 |
GroupFree3D*(L12,0256)+DisARM | 67.2 | 52.5 |
GroupFree3D*(w2×,L12,0256) | 68.2 | 52.6 |
GroupFree3D*(w2×,L12,0256)+DisARM | 69.3 | 53.6 |
Method | [email protected] | [email protected] |
---|---|---|
VoteNet | 57.7 | 35.8 |
VoteNet+DisARM | 61.5 | 41.3 |
imVoteNet* | 64.0 | - |
imVoteNet*+DisARM | 65.3 | - |
Notes:
- We use one NVIDIA GeForce RTX 3090 GPU for training GroupFree3D+DisARM and one NVIDIA TITAN V GPU for others.
- We report the best results on validation set during each training.
- * denotes that the model is implemented on MMDetection3D.
This repo is built based on MMDetection3D(V0.17.1), please follow the getting_started.md for installation.
The code is tested under the following environment:
Ubuntu 16.04 LTS
Anaconda
withpython=3.7.10
pytorch 1.9.0
cuda 11.1
GCC 5.4
Notes:
If you want to test on BRNet+DisARM
, please follow the getting_started.md to install the dependences under ./BRNet+DisARM
for the reason that BRNet is implemented on MMDetection3D V0.11.0.
For SUN RGB-D, follow the README under the /data/sunrgbd
folder.
For ScanNet, follow the README under the /data/scannet
folder.
Notes:
For BRNet+DisARM
, please follow the instruction under ./BRNet+DisARM/data/
to process the data for training and testing.
Using DisARM for your own detectors, please follow the steps:
1.copy ./mmdetection3d/mmdet3d/models/model_utils/disarm.py
to your project
2.import DisARM
module and input the proposal features and locations to the module
3.add relation_anchor_loss in your file
4.config DisARM as bellow:
disarm_module_cfg=dict(
sample_approach='OS-FFPS',
num_anchors=15,
num_candidate_anchors=64,
in_channels=YOUR_PROPOSAL_FEATURE_DIM,
),
relation_anchor_loss=dict(
type='VarifocalLoss',
use_sigmoid=True,
reduction='sum',
loss_weight=1.0
),
5.add the returned relation features to your proposal features
For VoteNet+DisARM
training, please go to the mmdetection
dir and run:
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/votenet_disarm_scannet.py
For BRNet+DisARM
training, please go to the brnet_diarm
dir and run:
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/brnet_disarm_scannet.py --seed 42
For H3DNet+DisARM
training, please go to the mmdetection
dir and run:
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/h3dnet_disarm_scannet.py
For GroupFree3D+DisARM
training, please go to the mmdetection
dir and run:
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/groupfree3d-L6-O256_disarm_scannet.py
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/groupfree3d-L12-O256_disarm_scannet.py
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/groupfree3d-L12-O256_disarm_scannet.py
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/groupfree3d-L12-O256_disarm_scannet.py
For VoteNet+DisARM
training, please go to the mmdetection
dir and run:
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/votenet_disarm_sunrgbd.py
For imVoteNet+DisARM
training, please go to the mmdetection
dir and run:
CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/disarm/imvotenet_disarm_sunrgbd.py
@article{duan2022disarm,
title={DisARM: Displacement Aware Relation Module for 3D Detection},
author={Yao Duan and Chenyang Zhu and Yuqing Lan and Renjiao Yi and Xinwang Liu and Kai Xu},
year={2022},
eprint={2203.01152},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
We thank a lot for the flexible codebase of MMDetection3Dand BRNet.
The code is released under MIT License (see LICENSE file for details).