SceneCraft: Layout-Guided 3D Scene Generation

Xiuyu Yang* · Yunze Man* · Jun-Kun Chen · Yu-Xiong Wang

[NeurIPS 2024] [Project Page] [arXiv] [pdf] [BibTeX] [License]

About

TL;DR We generate complex 3D scenes conditioned on free-form layout and viewpoints.

We introduce SceneCraft, an innovative framework for generating complex, detailed indoor scenes from textual descriptions and spatial layouts. By leveraging a rendering-based pipeline, and a layout-conditioned diffusion model, our work effectively converts 3D semantic layouts into multi-view 2D images and learns a final scene representation that is not only consistent and realistic but also adheres closely to user specifications. Please check out project page and paper for more details. The open-source release of our code and dataset promises to further empower research and development in this exciting domain.

BibTeX

If you find our work useful in your research, please consider citing our paper:

@inproceedings{yang2024scenecraft,
      title={SceneCraft: Layout-Guided 3D Scene Generation},
      author={Yang, Xiuyu and Man, Yunze and Chen, Jun-Kun and Wang, Yu-Xiong},
      booktitle={Advances in Neural Information Processing Systems},
      year={2024} 
}

Environment Setup

Clone and setup nerfstudio (better follow the version specified below).

Tips: Follow the tutorials of nerfstudio to verify the environment.

# install nerfstudio
pip install nerfstudio==0.3.4

# setup scenecraft
git --recurse-submodules clone https://github.com/OrangeSodahub/SceneCraft.git

cd SceneCraft/
pip install [-e] .

Tested environment is python3.9/3.10, torch2.0.1+cu117/118.

Train Diffusion Model

Tips: We host our finetuned diffusion models at (SD-Scannet++) (SD-Hypersim).

Before training, download the raw data and processed them into layout data.

For Scannet++, download from here and complete the image distortion and downscale for dslr images;
For Hypersim, download from here.

Tips: We host our processed layout data at (Data-Scannet++) (Data-Hypersim) which are used to train diffusion model, use it to skip following steps.

Run the following script to convert preprocessed data to layout data (check data path used in bash file):

bash scripts/prepare_dataset.sh \
      ${DATASET}              # choose from [Scannetpp, Hypersim]
      ${LIMIT}                # limit number of images per scene, set to 100
      ${GPUS}                 # number of gpus to use
      [--split]               # choose from ['train', 'val', 'all] for scannet++
      [--save-depth]          # store True, whether to save depth maps
      [--voxel-size]          # for scannet++ voxelization, set to 0.2; no use no voxelization

Generate JSONL data for efficient use of training (keep same settings as above):

bash scripts/generate_json.sh ${DATASET} ${LIMIT} [--voxel-size]

The expected well-perpared data (e.g. scannet++) structure of directory:

data
├── scannetpp
|   ├── data
|   |   ├── SCENE_ID0
|   |   |   ├── dslr
|   ├── ... ...
├── scannetpp_processed
|   ├── data # same structure as scannetpp/data/
|   ├── scannetpp_instance_data
|   ├── [voxel_data] # optional
|   ├── semantic_data
|   |   ├── SCENE_ID0
|   |   |   ├── IMAGE_ID0.png
|   |   |   ├── IMAGE_ID0.npz
|   ├── ... ...

Run the following script to train controlnets model (check model and data paths used in bash file):

bash scripts/train_controlnet_sd.sh \
      ${DATASET}                    # choose from [Scannetpp, Hypersim]
      [--condition_type]            # default one_hot
      [--conditioning_channels]     # default 8, should be less than 16
      [--enable_depth_cond]         # use depth condition
      [--controlnet_conditioning_scale]  # control factor of controlnet, e.g., 3.5 1.5
      [--resume_from_checkpoint]    # e.g. latest or .../checkpoint-1000
      [--report_to]                 # e.g. wandb

# e.g.
bash scripts/train_controlnet_sd.sh hypersim --condition_type one_hot --conditioning_channels 8 --enable_depth_cond --controlnet_conditioning_scale 3.5 1.5

Train SceneCraft Model

We use nerfacto from nerfstudio as the scene models. To generate a scene:

get its raw data (bounding boxes, labels and cameras);
get its layout data (semantic/depth images and jsonl file);
train its scene model.

Step1: this step is only needed for scene layout drawn by ourselves.

Use this webgui to draw your own layout, then export the layout and camera data files to ROOT/data/custom/(scene_id)/ which should be:

data
├── custom
|   ├── (scene_id)
|   |   ├── cameras.json
|   |   ├── layout.json

Step2: run the following script to get layout data from raw data:

bash scripts/generate_outputs.py \
      --layout            # to specify the output type
      --dataset           # choose from ['scannetpp', 'hypersim', 'custom']
      --scene_id          # scene_id
      --output_dir        # default outputs

Step3: (More specific instructions will be provided) More training details could be found in Supp. (Sec.A) of our paper. This training step requires at least TWO GPUs (check Appendix Sec.A of our paper).

Check the configurations at scenecraft/configs/method and run the following script:

# set RECORD to track results via wandb
# set DEBUG to log more detaild infos and for debugging
[RECORD=True] [DEBUG=True] ns-train ${method_name} [--machine.num-devices ${num_gpus}]

We will provide more details and release layout data examples/scene models soon.

TODO

Release detailed instructions for generation and visualization
Release layout examples
Release training code
Instructions for preparing data

Acknowledgement

Thansk for these excellent opensource works: nerfstudio; diffuser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SceneCraft: Layout-Guided 3D Scene Generation

About

TL;DR We generate complex 3D scenes conditioned on free-form layout and viewpoints.

BibTeX

Environment Setup

Train Diffusion Model

Train SceneCraft Model

TODO

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

SceneCraft: Layout-Guided 3D Scene Generation

About

TL;DR We generate complex 3D scenes conditioned on free-form layout and viewpoints.

BibTeX

Environment Setup

Train Diffusion Model

Train SceneCraft Model

TODO

Acknowledgement