Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images (NeurIPS 2022)

Yafei Yang, Bo Yang
Project Page | Paper

Overall Structure 🌎

This repository contains:

Complexity Factors Calculation for Datasets under Complexity_Factors/.
Six Datasets Generation / Adaptation under Dataset_Generation/, including:
- dSprites;
- Tetris;
- CLEVR;
- YCB;
- ScanNet;
- COCO.
Four Representative Methods Re-implementation / Adaptation including:
- AIR ("Attend, Infer, Repeat: Fast Scene Understanding with Generative Models") under AIR/;
- MONet ("MONet: Unsupervised Scene Decomposition and Representation") under MONet/;
- IODINE ("Multi-Object Representation Learning with Iterative Variational Inference") under IODINE/;
- Slot Attention ("Object-Centric Learning with Slot Attention") under Slot_Attention.
Evaluation of Object Segmentation Performance under Segmentation_Evaluation/, including:
- AP score;
- PQ score;
- Precision and Recall.

IJCV extension contains:

Additional Complexity Factors Calculation for Background under Complexity_Factors/.
MOVi Datasets Generation under Dataset_Generation/MOVi.
Background Complexity Factors Adaptation under Dataset_Generation/Ablation Dataset.
Additional Baseline DINOSAUR ("Bridging the Gap to Real-World Object-Centric Learning").
Additional Evaluation Metrics under Segmentation_Evaluation/, including:
- ARI;
- ARP;
- ARR;
- Background Recall.

Preparation 👷

1. Create conda environment

conda env create -f [env_name].yml
conda activate [env_name]

Note: Since this repo consists of implementation of different approaches, we use seperate conda environments to manage them. Specifcally, use tf1_env.yml to build environment for IODINE, use tf2_env.yml to build environment for Slot Attention and use pytorch_env.yml for AIR and MONet.

2. Prepare datasets

Datasets used in this paper can be downloaded here. We provide both TFRecord and PNG files for each dataset. Alternatively, you can generate datasets following below instructions.

2.1 Dsprites dataset

Download raw dSprites shape data from https://github.com/deepmind/dsprites-dataset. Put downloaded dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz under Dataset_Generation/dSprites.
Create our dSprite dataset using given shape data with:

cd Dataset_Generation
python dSprites/create_dsprites_dataset.py --n_imgs [num_imgs] --root [dSprites_location] --min_object_count 2 --max_object_count 6

This will create [num_imgs] images and their corresponding masks under [dSprites_location]/image and [dSprites_location]/mask.

2.2 Tetris Dataset

Download Tetrominoes dataset from https://github.com/deepmind/multi_object_datasets. Put downloaded tetrominoes_train.tfrecords under Dataset_Generation/Tetris.
Parse tfrecord data into images with:

cd Dataset_Generation
python Tetris/read_tetris_tfrecords.py

This will create 10000 images from tetrominoes dataset of resolution 35x35 under Tetris/tetris_source .
Create our Tetris dataset using previously parsed images with:

python Tetris/create_tetris_dataset.py --n_imgs [num_imgs] --root [Tetris_location] --min_object_count 2 --max_object_count 6

This will create [num_imgs] images and their corresponding masks under [Tetris_location]/image and [Tetris_location]/mask.

2.3 CLEVR Dataset

Clone and follow the instructions of repo https://github.com/facebookresearch/clevr-dataset-gen and render CLEVR images with:

cd image_generation
blender --background --python render_images.py -- --num_images [num_imgs] --min_objects 2 --max_objects 6

If you have an NVIDIA GPU with CUDA installed then you can use the GPU to accelerate rendering:

blender --background --python render_images.py -- --num_images [num_imgs] --min_objects 2 --max_objects 6 --use_gpu 1

Put rendered images and masks under Dataset_Generation/CLEVR/clevr_source/images and Dataset_Generation/CLEVR/clevr_source/masks.
Create our CLEVR dataset using previously rendered images with:

python CLEVR/create_clevr_dataset.py --n_imgs [num_imgs] --root [CLEVR_location] --min_object_count 2 --max_object_count 6

This will create [num_imgs] images and their corresponding masks under [CLEVR_location]/image and [CLEVR_location]/mask.

2.4 YCB Dataset

Download 256-G video-YCB dataset from https://rse-lab.cs.washington.edu/projects/posecnn/. Put them under Dataset_Generation/YCB/YCB_Video_Dataset Create our YCB dataset using raw video-YCB images with:

python YCB/create_YCB_dataset.py --n_imgs [num_imgs] --root [YCB_location] --min_object_count 2 --max_object_count 6

This will create [num_imgs] images and their corresponding masks under [YCB_location]/image and [YCB_location]/mask.

2.5 ScanNet Dataset

Download ScanNet data and put it under Dataset_Generation/ScanNet/scannet_raw. Process ScanNet data into Dataset_Generation/ScanNet/scans_processed with:

python ScanNet/process_scannet_data.py

This will parse 2d images from ScanNet sensor data, unzip raw 2d instance label (filterd version) in ScanNet and parse the offical train/val split downloaded from: https://github.com/ScanNet/ScanNet/tree/master/Tasks/Benchmark.\ Create our ScanNet dataset using processed ScanNet data with: `

python COCO/create_ScanNet_dataset.py --n_imgs [num_imgs] --root [ScanNet_location] --min_object_count 2 --max_object_count 6

This will create [num_imgs] images and their corresponding masks under [ScanNet_location]/image and [ScanNet_location]/mask.

2.6 COCO Dataset

Download COCO data from http://images.cocodataset.org/zips/val2017.zip (valdiation), http://images.cocodataset.org/zips/train2017.zip (train) and http://images.cocodataset.org/annotations/annotations_trainval2017.zip (annotations). Put them under Dataset_Generation/COCO/COCO_raw.
Parse segmentation mask from annotation file with:

python COCO/process_coco_dataset.py

Create our COCO dataset using originl COCO images and parsed masks with:

python YCB/create_ScanNet_dataset.py --n_imgs [num_imgs] --root [COCO_location] --min_object_count 2 --max_object_count 6

This will create [num_imgs] images and their corresponding masks under [COCO_location]/image and [COCO_location]/mask.

2.7 MOVi Dataset

Details for MOVi-C and MOVi-E datasets can be found at https://github.com/google-research/kubric/tree/main/challenges/movi. They can be directly loaded with:

ds = tfds.load("movi_c/128x128", data_dir="gs://kubric-public/tfds") 
ds = tfds.load("movi_e/128x128", data_dir="gs://kubric-public/tfds")

Images and masks with PNG format can be parsed with:

python MOVi/movi_c_128.py 
python MOVi/movi_e_128.py

3. Create ablation datasets

Use Dataset_Generation/Ablation Dataset/object_level_ablation.py to create datasets ablated on object level factors.
Use Dataset_Generation/Ablation Dataset/scene_level_ablation.py to create datasets ablated on scene level factors.
Use Dataset_Generation/Ablation Dataset/joint_ablation.py to create datasets ablated on both object and scene level factors.
Use Dataset_Generation/Ablation Dataset/bg_ablation.py to create datasets ablated on background factors.

Details examples and usages can be found in corresponding scripts.

Launch Training 🚀

1. AIR

Training:

cd AIR/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --max_steps 6

Testing

cd AIR/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --eval_mode --resume [ckpt]

where:

dataset_name is the name of the dataset, e.g. dSprites, YCB.
gpu_id is the target cuda device id.
ckpt is the checkpoint to be resume in the testing stage.
in all experiments for AIR, we set the max_steps to be 6.

2. MONet

Training:

cd MONet/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --K_steps 7

Testing:

cd MONet/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --K_steps 7 --eval_mode --resume [ckpt]

where:

dataset_name is the name of the dataset, e.g. dSprites, YCB.
gpu_id is the target cuda device id.
ckpt is the checkpoint to be resume in the testing stage.
in all experiments for MONet, we set the K_steps to be 7.

3. IODINE

Training:

cd IODINE/
CUDA_VISIBLE_DEVICES=[gpu_id] python main.py -f with [dataset_name_train]

Testing:

cd IODINE/
CUDA_VISIBLE_DEVICES=[gpu_id] python eval.py --dataset_identifier [dataset_name_test]

where:

dataset_name_train is the name of the trainining dataset, e.g. dSprites_train, YCB_train.
dataset_name_test is the name of the testing dataset, e.g. dSprites_test, YCB_test.
gpu_id is the target cuda device id.

4. Slot Attention

Training:

cd Slot_Attention/
CUDA_VISIBLE_DEVICES=[gpu_id] python train.py --dataset [dataset_name] --num_slots 7

Testing:

cd Slot_Attention/
CUDA_VISIBLE_DEVICES=[gpu_id] python eval.py --dataset [dataset_name] --num_slots 7

where:

dataset_name is the name of the dataset, e.g. dSprites, YCB.
gpu_id is the target cuda device id.
in all experiments for Slot Attention, we set the num_slots to be 7.

5. DINOSAUR

We use the official repo for all experiments on DINOSAUR, code and instructions can be found at: https://github.com/amazon-science/object-centric-learning-framework. Examples are as follows:

Training:

CUDA_VISIBLE_DEVICES=[gpu_id] poetry run ocl_train +experiment=projects/bridging/dinosaur/movi_c_feat_rec

Testing:

CUDA_VISIBLE_DEVICES=2 poetry run ocl_eval +evaluation=projects/bridging/metrics_coco +train_config_name=config +train_config_path=[config path]

where:

gpu_id is the target cuda device id.
config path is the path for DINOSAUR configurations.

Complexity factors for datasets 📊

Calculate object-level and scene-level complexity factors with Complexity_Factors/Complexity_Factor_Evaluator.py. Examples are provided in that script.

Visualization 👀

Citation

If you find our work useful in your research, please consider citing:

@article{yang2022,
  title={Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images},
  author={Yang, Yafei and Yang, Bo},
  journal={NeurIPS},
  year={2022}
}

@article{yang2024benchmarking,
    title={Benchmarking and Analysis of Unsupervised Object Segmentation from Real-World Single Images},
    author={Yang, Yafei and Yang, Bo},
    journal={International Journal of Computer Vision},
    volume={132},
    number={6},
    pages={2077--2113},
    year={2024},
    publisher={Springer}
}

Updates

5/10/2022: Initial release！
18/10/2024: Content related to IJCV extension has been included in this repo!

Acknowledgement 💡

This project references the following repositories:

https://pyro.ai/examples/air.html
https://github.com/addtt/attend-infer-repeat-pytorch
https://github.com/applied-ai-lab/genesis
https://github.com/deepmind/deepmind-research/tree/master/iodine
https://github.com/google-research/google-research/tree/master/slot_attention
https://github.com/google-research/kubric/tree/main/challenges/movi
https://github.com/amazon-science/object-centric-learning-framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images (NeurIPS 2022)

Overall Structure 🌎

Preparation 👷

1. Create conda environment

2. Prepare datasets

2.1 Dsprites dataset

2.2 Tetris Dataset

2.3 CLEVR Dataset

2.4 YCB Dataset

2.5 ScanNet Dataset

2.6 COCO Dataset

2.7 MOVi Dataset

3. Create ablation datasets

Launch Training 🚀

1. AIR

2. MONet

3. IODINE

4. Slot Attention

5. DINOSAUR

Complexity factors for datasets 📊

Visualization 👀

Citation

Updates

Acknowledgement 💡

Files

README.md

Latest commit

History

README.md

File metadata and controls

Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images (NeurIPS 2022)

Overall Structure 🌎

Preparation 👷

1. Create conda environment

2. Prepare datasets

2.1 Dsprites dataset

2.2 Tetris Dataset

2.3 CLEVR Dataset

2.4 YCB Dataset

2.5 ScanNet Dataset

2.6 COCO Dataset

2.7 MOVi Dataset

3. Create ablation datasets

Launch Training 🚀

1. AIR

2. MONet

3. IODINE

4. Slot Attention

5. DINOSAUR

Complexity factors for datasets 📊

Visualization 👀

Citation

Updates

Acknowledgement 💡