Skip to content

[ECCV 2024] SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction

License

Notifications You must be signed in to change notification settings

abhi1kumar/SUP-NeRF

Repository files navigation

Monocular 3D reconstruction for categorical objects heavily relies on accurately perceiving each object's pose. While gradient-based optimization in a NeRF framework updates the initial pose, this paper highlights that scale-depth ambiguity in monocular object reconstruction causes failures when the initial pose deviates moderately from the true pose. Consequently, existing methods often depend on a third-party 3D object to provide an initial object pose, leading to increased complexity and generalization issues. To address these challenges, we present SUP-NeRF, a streamlined Unification of object Pose estimation and NeRF-based object reconstruction. SUP-NeRF decouples the object's dimension estimation and pose refinement to resolve the scale-depth ambiguity, and introduces a camera-invariant projected-box representation that generalizes cross different domains. While using a dedicated pose estimator that smoothly integrates into an object-centric NeRF, SUP-NeRF is free from external 3D detectors. SUP-NeRF achieves state-of-the-art results in both reconstruction and pose estimation tasks on the nuScenes dataset. Furthermore, SUP-NeRF exhibits exceptional cross-dataset generalization on the KITTI and Waymo datasets, surpassing prior methods with up to 50% reduction in rotation and translation error.

Citation

If you find our work useful in your research, please consider starring the repo and citing:

@inproceedings{guo2024supnerf,
   title={{SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular $3$D Object Reconstruction}},
   author={Yuliang Guo, Abhinav Kumar, Cheng Zhao, Ruoyu Wang, Xinyu Huang, Liu Ren},
   booktitle={ECCV},
   year={2024}
}

Catalog

Installation

conda create -y -n sup-nerf python=3.8
conda activate sup-nerf

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

Data Preparation

Spaces

nuScenes

We use nuScenes dataset for both training and testing. Download NuScenes dataset to you data directory and soft link the related directories to the project data/NuScenes directory. The required data structure is as follows:

SUPNERF
├── data
│      ├── NuScenes
│      │     ├── samples
│      │     ├── maps
│      │     ├── v1.0-mini   
│      │     ├── v1.0-trainval
│      │     ├── pred_instance   
│      │     └── pred_det3d
│      │ ...        
│ ...

samples, maps, v1.0-mini, v1.0-trainval are directly downloaded from the nuScenes dataset.

pred_instance includes the required instance masks prepared via a customized script in our folk of mask-rcnn detectron2. Our prepared directory can be directly downloaded from [dropbox] [hugging face].

pred_det3d includes 3D object detection results via a customized script in our folk of FCOS3D. It is only required by previous method AutoRF. If you only consider trying our method, you may not need it. Our prepared directory can be directly downloaded from [dropbox] [hugging face].

SUP-NeRF follows a object-centric setup, where only a subset of annotated objects are curated for experiments. Please follow our paper for the data curation details. The curated subsets and splits are recorded in .json files in data/NuScenes. To modify the curation, check src/data_nuscenes.py to re-run the preprocess step.

KITTI

We use KITTI dataset in cross-domain generalization test. We follow DEVIANT to setup the basic KITTI directory. And we prepare additional directories for our experiments. The required data structure is as follows:

SUPNERF
├── data
│      ├── KITTI
│      │      ├── ImageSets
│      │      ├── kitti_split1
│      │      └── training
│      │           ├── calib
│      │           ├── image_2
│      │           ├── label_2
│      │           ├── velodyne
│      │           ├── pred_instance
│      │           └── pred
│      │  ...
│ ...

Because only the training split of KITTI dataset includes ground-truth object annotations, we conduct cross-domain evaluation on the training split of KITTI dataset.

calib, image_2, label_2, velodyne are directly downloaded from the KITTI website.

Similar to nuScenes, pred_instance includes the required instance masks prepared via a customized script in our folk of mask-rcnn detectron2. Our prepared directory can be directly downloaded from [dropbox] [hugging face].

Similar to nuScenes, pred includes 3D object detection results via a customized script in our folk of FCOS3D. It is only required by previous method AutoRF. Our prepared directory can be directly downloaded from [dropbox] [hugging face].

The object-center curated subsets and splits for our experiments are recorded in those .json files in data/KITTI. To modify the curation, check src/data_kitti.py to re-run the preprocess step.

Waymo (Front View)

We use Waymo dataset validation split for cross-domain generalization test. We follow DEVIANT to prepare waymo dataset similar to KITTI. And we prepare additional directories for our experiments. The required data structure is as follows:

SUPNERF
├── data
│      ├── Waymo
│      │      ├── ImageSets
│      │      └── validation
│      │           ├── calib
│      │           ├── image
│      │           ├── label
│      │           ├── velodyne
│      │           ├── pred_instance
│      │           └── pred
│      │  ...
│ ...

calib, image, label, velodyne are directly prepared following DEVIANT. If you want to prepare on your own, you could download the validation set from Waymo website, and use our script data/Waymo/converter.py. Our experiments are limited to the front view of Waymo. For all the surrounding views, you may refer to mmlab-version converter for the data preparation.

Similar to nuScenes, pred_instance includes the required instance masks prepared via a customized script in our folk of mask-rcnn detectron2. Our prepared directory can be directly downloaded from [dropbox] [hugging face].

Similar to nuScenes, pred includes 3D object detection results via a customized script in our folk of FCOS3D. It is only required by previous method AutoRF. Our prepared directory can be directly downloaded from [dropbox] [hugging face].

The object-centric curated subsets and splits for our experiments are recorded in those .json files in data/Waymo. To modify the curation, check src/data_waymo.py to re-run the preprocess step.

VSCode Launch

All the training and testing pipelines described in the later sections are all included in .vscode/launch.json for convinient usage and debug. You may modify the argments, and use VSCode 'Run and Debug' panel to execute any of the included pipelines.

Testing

Spaces

For testing, optimize_nuscenes.py can be used to evaluate the trained models. Models are only trained on the nuScenes dataset, but are tested on nuScene, KITTI, and Waymo datasets. The specific checkpoint paths are appointed in the those config files. The default paths point to our provided checkpoints, which can be downloaded [dropbox]. You will need to save it to the repo as below before executing the following testing pipeline.

SUPNERF
├── checkpoints
│      ├── supnerf
│      └── autorfmix
│...

Specifically for the testing arguments used below, --add_pose_err 2 is to initialize with random pose, --add_pose_err 3 is to initialize with pose predicted by third-party detector FCOS3D. --reg_iter indicate the number of iterations to execute pose refine module, which is a key design of SUPNeRF.

Changing --vis to 1 makes the pipelie output visual results at the beginning and the end of the process, setting it to 2 makes the pipeline output visual outputs at every iteration, similar to those shown in the demo video. You can modify other arguments as needed. For more details, check optimize_nuscenes.py, optimize_kitti.py and optimize_waymo.py.

You may also check scripts/eval_saved_result.py to evaluate saved testing results quickly for quantitative numbers. The scores reported in the later sections are slightly different from the paper due to code cleaning, but the conclusions from the paper all hold. To evaluate all the provided saved results, execute

bash evaluate_all.sh

nuScenes (In-Domain)

To test SUPNeRF on nuScenes, execute

python optimize_nuscenes.py --config_file jsonfiles/supnerf.nusc.vehicle.car.json --gpu 0 --add_pose_err 2 --reg_iter 3 --vis 0

To test AutoRF on nuScenes, execute

python optimize_nuscenes.py --config_file jsonfiles/autorfmix.nusc.vehicle.car.json --gpu 0 --add_pose_err 3 --reg_iter 0 --vis 0

Testing results will be saved into a new folder created in the corresponding checkpoint folder. The quantitative evaluation results will be similar to:

Method PSNR Dep.E(m) Rot.E(deg.) Trans.E(m) PSNR-C DepE-C(m) Config Predictions
FF / 50it FF / 50it FF / 50it FF / 50it FF / 50it FF / 50it
SUP-NeRF (Ours) 10.5 / 18.8 0.69 / 0.61 7.25 / 7.3 0.69 / 0.74 10.6 / 10.9 1.22 / 1.13 config predictions
AutoRF-FCOS3D 7.1 / 16.5 1.4 / 0.83 9.77 / 10.93 0.85 / 0.75 9.85 / 10.5 1.30 / 1.16 config predictions

KITTI (Cross-Domain)

To test SUPNeRF on KITTI, execute

python optimize_kitti.py --config_file jsonfiles/supnerf.kitti.car.json --gpu 0 --add_pose_err 2 --reg_iter 3 --vis 0

To test AutoRF on KITTI, execute

python optimize_kitti.py --config_file jsonfiles/autorfmix.kitti.car.json --gpu 0 --add_pose_err 3 --reg_iter 0 --vis 0

Testing results will be saved into a new folder created in the corresponding checkpoint folder. The quantitative evaluation results will be similar to:

Method PSNR Dep.E(m) Rot.E(deg.) Trans.E(m) Config Predictions
FF / 50it FF / 50it FF / 50it FF / 50it
SUP-NeRF (Ours) 5.0 / 14.6 1.51 / 1.11 8.89 / 8.85 1.49 / 1.55 config predictions
AutoRF-FCOS3D 1.3 / 11.0 2.72 / 1.80 11.79 / 18.51 2.2 / 1.95 config predictions

Waymo (Cross-Domain)

To test SUPNeRF on Waymo, execute

python optimize_waymo.py --config_file jsonfiles/supnerf.waymo.car.json --gpu 0 --add_pose_err 2 --reg_iter 3 --vis 0

To test AutoRF on Waymo, execute

python optimize_waymo.py --config_file jsonfiles/autorfmix.waymo.car.json --gpu 0 --add_pose_err 3 --reg_iter 0 --vis 0

Testing results will be saved into a new folder created in the corresponding checkpoint folder. The quantitative evaluation results will be similar to:

Method PSNR Dep.E(m) Rot.E(deg.) Trans.E(m) Config Predictions
FF / 50it FF / 50it FF / 50it FF / 50it
SUP-NeRF (Ours) 4.8 / 17.0 2.32 / 1.56 10.01 / 10.6 1.68 / 1.54 config predictions
AutoRF-FCOS3D 4.8 / 15.8 2.29 / 2.35 6.97 / 9.11 3.22 / 3.43 config predictions

Training

To train SUPNeRF on nuScenes, execute

python train_nuscenes.py --config_file jsonfiles/supnerf.nusc.vehicle.car.json --gpus 4 --batch_size 48 --num_workers 16 --epochs 40

train_nuscenes.py can train different object-centric NeRFs.

To train AutoRF on nuScenes, execute

python train_nuscenes.py --config_file jsonfiles/autorfmix.nusc.vehicle.car.json --gpus 4 --batch_size 48 --num_workers 16 --epochs 40

There are additional specific settings can be optionally changed. For those interested developers, check train_nuscenes.py for details. You can also modify other hyperparameters in the corresponding json files included in jsonfiles/. The network named autorfmix is slightly different from the original AutoRF in encoder so that both SUPNeRF and AutoRF share the same encoder as CodeNeRF for fair comparison.

We implement multi-gpu training using DP rather than DDP (which might be more optimal) and record training logs using tensorboard.

BootInv

For those developers interested to evaluate BootInv (A.K.A. nerf-from-image) on real-world autonomous driving datasets like SUPNeRF and AutoRF, we provide our fork of BootInv with additional evaluation pipelines here.

You will need to follow the original instruction to install the package and prepared the pre-trained models. Then you can follow the same data preparation in this repo, while putting all the dataset structures under nerf-from-image/datasets/. Then you can follow closely with the '.vscode/launch.json' file in our folk here to conduct testing and evaluation of BootInv on the three major autonomous dirving datasets including nuScenes, KITTI, and Waymo.

Acknowledgements

We thank the authors of the following awesome codebases:

Please also consider citing them.

License

SUP-NeRF code is under the MIT license.

About

[ECCV 2024] SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published