This is PyTorch implementation of our paper "LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network"(CVPR'22). [Supplemental Materials] [Video] [Presentation] [Poster]
- 2023.5.18 Update post-processing. If you want to reproduce the post-processing results of paper, please switch to the old commit. Check out the Post-Porcessing.md for more information.
- demo app that runs on HuggingFace Space🤗.
- demo script that runs on Google colab.
Install our dependencies:
pip install -r requirements.txt
Office MatterportLayout dataset is at here.
If you have problems using this dataset, attention to this issue.
Make sure the dataset files are stored as follows:
src/dataset/mp3d
|-- image
| |-- 17DRP5sb8fy_08115b08da534f1aafff2fa81fc73512.png
|-- label
| |-- 17DRP5sb8fy_08115b08da534f1aafff2fa81fc73512.json
|-- split
|-- test.txt
|-- train.txt
|-- val.txt
Statistics
Split | All | 4 Corners | 6 Corners | 8 Corners | >=10 Corners |
---|---|---|---|---|---|
All | 2295 | 1210 | 502 | 309 | 274 |
Train | 1647 | 841 | 371 | 225 | 210 |
Val | 190 | 108 | 46 | 21 | 15 |
Test | 458 | 261 | 85 | 63 | 49 |
Office ZInd dataset is at here.
Make sure the dataset files are stored as follows:
src/dataset/zind
|-- 0000
| |-- panos
| | |-- floor_01_partial_room_01_pano_14.jpg
| |-- zind_data.json
|-- room_shape_simplicity_labels.json
|-- zind_partition.json
Statistics
Split | All | 4 Corners | 5 Corners | 6 Corners | 7 Corners | 8 Corners | 9 Corners | >=10 Corners | Manhattan | No-Manhattan(%) |
---|---|---|---|---|---|---|---|---|---|---|
All | 31132 | 17293 | 1803 | 7307 | 774 | 2291 | 238 | 1426 | 26664 | 4468(14.35%) |
Train | 24882 | 13866 | 1507 | 5745 | 641 | 1791 | 196 | 1136 | 21228 | 3654(14.69%) |
Val | 3080 | 1702 | 153 | 745 | 81 | 239 | 22 | 138 | 2647 | 433(14.06%) |
Test | 3170 | 1725 | 143 | 817 | 52 | 261 | 20 | 152 | 2789 | 381(12.02%) |
We follow the same preprocessed pano/s2d3d proposed by HorizonNet. You also can directly download the dataset file in here.
Make sure the dataset files are stored as follows:
src/dataset/pano_s2d3d
|-- test
| |-- img
| | |-- camera_0000896878bd47b2a624ad180aac062e_conferenceRoom_3_frame_equirectangular_domain_.png
| |-- label_cor
| |-- camera_0000896878bd47b2a624ad180aac062e_conferenceRoom_3_frame_equirectangular_domain_.txt
|-- train
| |-- img
| |-- label_cor
|-- valid
|-- img
|-- label_cor
We provide pre-trained weights on individual datasets at here.
- mp3d/best.pkl: Training on MatterportLayout dataset
- zind/best.pkl: Training on ZInd dataset
- pano/best.pkl: Training on PanoContext(train)+Stanford2D-3D(whole) dataset
- s2d3d/best.pkl: Training on Stanford2D-3D(train)+PanoContext(whole) dataset
- ablation_study_full/best.pkl: Ablation Study: Ours (full) on MatterportLayout dataset
Make sure the pre-trained weight files are stored as follows:
checkpoints
|-- SWG_Transformer_LGT_Net
| |-- ablation_study_full
| | |-- best.pkl
| |-- mp3d
| | |-- best.pkl
| |-- pano
| | |-- best.pkl
| |-- s2d3d
| | |-- best.pkl
| |-- zind
| |-- best.pkl
You can evaluate by executing the following command:
- MatterportLayout dataset
python main.py --cfg src/config/mp3d.yaml --mode test --need_rmse
- ZInd dataset
python main.py --cfg src/config/zind.yaml --mode test --need_rmse
- PanoContext dataset
python main.py --cfg src/config/pano.yaml --mode test --need_cpe --post_processing manhattan --force_cube
- Stanford 2D-3D dataset
python main.py --cfg src/config/s2d3d.yaml --mode test --need_cpe --post_processing manhattan --force_cube
--post_processing
type of post-processing approach, we use DuLa-Net post-processing and optimize by adding occlusion detection (described in here ) to processmanhattan
constraint (manhattan_old
represents the original method), use DP algorithm to processatalanta
constraint, default is disabled.--need_rmse
need to evaluate root mean squared error and delta error, default is disabled.--need_cpe
need to evaluate corner error and pixel error, default is disabled.--need_f1
need to evaluate corner metrics (Precision, Recall and F$_1$-score) with 10 pixels as threshold(code from here), default is disabled.--force_cube
force cube shape when evaluating, default is disabled.--wall_num
different corner number to evaluate, default is all.--save_eval
save the visualization evaluating results of each panorama, the output results locate in the corresponding checkpoint directory (e.g.,checkpoints/SWG_Transformer_LGT_Net/mp3d/results/test
), default is disabled.
Execute the following commands to train (e.g., MatterportLayout dataset):
python main.py --cfg src/config/mp3d.yaml --mode train
You can copy and modify the configuration in YAML
file for other training.
We provide an inference script (inference.py
) that you can
try to predict your panoramas by executing the following command (e.g., using pre-trained weights of MatterportLayout dataset):
python inference.py --cfg src/config/mp3d.yaml --img_glob src/demo/demo1.png --output_dir src/output --post_processing manhattan
It will output json files(xxx_pred.json
, format is the same as PanoAnnotator) and visualization images (xxx_pred.png
) under output_dir.
visualization image:
-
--img_glob
a panorama path or directory path for prediction. -
--post_processing
Ifmanhattan
is selected, we will preprocess the panorama so that the vanishing points are aligned with the axes for post-processing. Note that after preprocessing our predicted results will not align with your input panoramas, you can use the output file (vp.txt
) of vanishing points to reverse align them manually. -
--visualize_3d
3D visualization of output results (need install dependencies and GUI desktop environment). -
--output_3d
output the object file of 3D mesh reconstruction.
The code style is modified based on Swin-Transformer.
Some components refer to the following projects:
If you use this code for your research, please cite
@InProceedings{jiang2022lgt,
author = {Jiang, Zhigang and Xiang, Zhongzheng and Xu, Jinhua and Zhao, Ming},
title = {LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022}
}