Skip to content

The code of point cloud cluster for 3D representation learning

License

Notifications You must be signed in to change notification settings

Amazingren/Point-CMAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Point-CMAE

Baseline Methods: Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

The official PyTorch Implementation of Point-CMAE

Bin Ren 1,2, Guofeng Mei3, Danda Pani Paudel4,5, Weijie Wang2,3, Yawei Li 4, Mengyuan Liu6, Rita Cucchiara7,Luc Van Gool 4,5, and Nicu Sebe 2

1 University of Pisa, Italy,
2 University of Trento, Italy,
3 Fondazione Bruno Kessler, Italy,
4 ETH Zürich, Switzerland,
5 INSAIT Sofia University, Bulgaria,
6 Peking University, China,
7 University of Modena and Reggio Emilia, Italy,

paper

Latest

  • 📌 09/24/2024: We are organizing the codes, it will be released soon.
  • 🎉 09/20/2024: Our paper is accepted by 17th Asian Conference on Computer Vision (ACCV2024)!
  • 📌 07/18/2024: Repository is created. Our code will be made publicly available upon acceptance.

Method


Abstract Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance. To address this limitation, we reintroduce CL into the MAE-based point cloud pre-training paradigm by leveraging the inherent contrastive properties of MAE. Specifically, rather than relying on extensive data augmentation as commonly used in the image domain, we randomly mask the input tokens twice to generate contrastive input pairs. Subsequently, a weight-sharing encoder and two identically structured decoders are utilized to perform masked token reconstruction. Additionally, we propose that for an input token masked by both masks simultaneously, the reconstructed features should be as similar as possible. This naturally establishes an explicit contrastive constraint within the generative MAE-based pre-training paradigm, resulting in our proposed method, Point-CMAE. Consequently, Point-CMAE effectively enhances the representation quality and transfer performance compared to its MAE counterpart. Experimental evaluations across various downstream applications, including classification, part segmentation, and few-shot learning, demonstrate the efficacy of our framework in surpassing state-of-the-art techniques under standard ViTs and single-modal settings. Our code will be released upon acceptance.

Pretrained Models

TBD

3D Object Detection

TBD

Usage

TBD

Requirements

  • PyTorch >= 1.7.0
  • python >= 3.7
  • CUDA >= 9.0
  • GCC >= 4.9
  • torchvision
# Create the virtual environment via micromamba or anaconda:
micromamba/conda create -n points python=3.8 -y

# Install PyTorch 1.11.0 + CUDA 11.3
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

# Install Other libs
pip install -r requirements.txt

# Install pytorch3d from wheels (We use the chamfer distance loss within pytorch3d)
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html
bash install.sh

or from source:
pip install "git+https://github.com/facebookresearch/pytorch3d.git"


# Install PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# Install GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

Dataset

For ModelNet40, ScanObjectNN, and ShapeNetPart datasets, we use ShapeNet for the pre-training of MaskPoint models, and then finetune on these datasets respectively.

The details of used datasets can be found in DATASET.md.

Point-CMAE pre-training

To pre-train the Point-CMAE models on ShapeNet, simply run:

python main.py --config cfgs/pretrain_shapenet.yaml \
    --exp_name pretrain_shapenet \
    [--val_freq 10]

Fine-tuning on downstream tasks

We finetune our Point-CMAE on 5 downstream tasks: Classfication on ModelNet40, Few-shot learning on ModelNet40, Transfer learning on ScanObjectNN, Part segmentation on ShapeNetPart.

ModelNet40

To finetune a pre-trained Point-CMAE model on ModelNet40, simply run:

python main.py
    --config cfgs/finetune_modelnet.yaml \
    --finetune_model \
    --ckpts <path> \
    --exp_name <name>

To evaluate a model finetuned on ModelNet40, simply run:

bash ./scripts/test.sh <GPU_IDS>\
    --config cfgs/finetune_modelnet.yaml \
    --ckpts <path> \
    --exp_name <name>

Few-shot Learning on ModelNet40

We follow the few-shot setting in the previous work.

First, generate your own few-shot learning split or use the same split as us (see DATASET.md).

ScanObjectNN

To finetune a pre-trained Point-CMAE model on ScanObjectNN, simply run:

python main.py \
    --config cfgs/finetune_scanobject_hardest.yaml \
    --finetune_model \
    --ckpts <path> \
    --exp_name <name>

To evaluate a model on ScanObjectNN, simply run:

bash ./scripts/test_scan.sh <GPU_IDS>\
    --config cfgs/finetune_scanobject_hardest.yaml \
    --ckpts <path> \
    --exp_name <name>

ShapeNetPart

TBD

Citation

If you find our work helpful, please consider citing the following paper and/or ⭐ the repo.

article{ren2024bringing,
    title={Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning},
    author={Ren, Bin and Mei, Guofeng and Paudel, Danda Pani and Wang, Weijie and Li, Yawei and Liu, Mengyuan and Cucchiara, Rita and Van Gool, Luc and Sebe, Nicu},
    journal={arXiv preprint arXiv:2407.05862},
    year={2024}
}

Acknowledgements

This code is built on Point-MAE.

About

The code of point cloud cluster for 3D representation learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published