One-step Noisy Label Mitigation

Hao Li¹^*, Jiayang Gu¹^*, Jingkuan Song¹^†, An Zhang², Lianli Gao¹,

¹University of Electronic Science and Technology of China

²National University of Singapore

The official implementation of the paper "One-step Noisy Label Mitigation".

Requirements

We recommend the following dependencies.

Python 3.8
PyTorch 1.13.1

Then, please install other environment dependencies through:

pip install -r requirements.txt

The recommended GPU memory is at least 24 GB.

⚙️ Dataset Preparation

Annotation Preparation

We follow the same split provided by NPC.

You should first download and extract the formulated datasets directory, which contains all the annotation files. You can access them here.

Our formulated datasets directory tree looks like:

${DATASETS}/
├── MSCOCO/
│    ├── annotations/
│    │       ├── 0.0_clean_index.npy
│    │       ├── 0.0_noise_train_caps.txt
│    │       ├── 0.2_clean_index.npy
│    │       ├── ...
│    │       ├── train_caps.txt
│    │       ├── train_ids.txt
│    │       ├── dev_caps.txt
│    │       └── ...
│    └── images/ # empty
│
├── Flickr30K/
│    ├── annotations
│    │       ├── 0.0_clean_index.npy
│    │       ├── 0.0_noise_train_caps.txt
│    │       ├── 0.2_clean_index.npy
│    │       ├── ...
│    │       ├── train_caps.txt
│    │       ├── train_ids.txt
│    │       ├── dev_caps.txt
│    │       └── ...
│    └── images/
│
└── CC120K/
     ├── annotations
     │       ├── train_caps.txt
     │       ├── train_ids.txt
     │       ├── dev_caps.txt
     │       └── ...
     └── images/ # empty

Image Preparation

The images directory are still empty. You need to download the images separately and place them directly in the corresponding dataset's images folder. We adopt the same image download and processing method as NPC.

Download Link

MSCOCO. We unified the images' name format of the MSCOCO dataset for easier use. You can use util.py to rename the images in MSCOCO.
Flickr30K.
CC120K. You can download the dataset from this link with the extraction code "3ble".

🔥 Training

The following are the training instructions for various datasets. Please set the ${DATASETS} to previously configured datasets folder. You can specify the ${SAVE_PATH}$ to the model path you'd like to save to.

Training on MS-COCO: You can adjust the noise ratio in the training set by changing ${NOISE_RATIO}$ , which can be selected from the following values: [0.0, 0.2, 0.4, 0.5, 0.6].

python main_clip.py --batch_size 256 --epochs 5 --lr 1e-5 --warmup 500 --vision_model ViT-B/32 --dataset coco --dataset_root ${DATASETS}$/MSCOCO --checkpoint_path ${SAVE_PATH}$ --noise_ratio ${NOISE_RATIO}$

Training on Flickr30K:

You can adjust the noise ratio in the training set by changing ${NOISE_RATIO}$ , which can be selected from the following values: [0.0, 0.2, 0.4, 0.6].

python main_clip.py --batch_size 256 --epochs 5 --lr 1e-5 --warmup 500 --vision_model ViT-B/32 --dataset f30k --dataset_root ${DATASETS}$/Flickr30K --checkpoint_path ${SAVE_PATH}$ --noise_ratio ${NOISE_RATIO}$

Training on CC120K:

CC120K is a real-world noisy dataset, so the noise ratio is not need to be specified.

python main_clip.py --batch_size 256 --epochs 10 --lr 1e-5 --warmup 500 --vision_model ViT-B/32 --dataset cc --dataset_root ${DATASETS}$/CC120K --checkpoint_path ${SAVE_PATH}$

📋 Evaluation

Experimental Results

Evaluation Instuctions

You can evaluate your trained model by executing the following commands. Please set the ${DATASETS} to previously configured datasets folder. ${MODEL_PATH}$ is the model to be evaluated.

Evaluation on MSCOCO:

python main_clip.py --eval --vision_model ViT-B/32 --dataset coco --dataset_root ${DATASETS}$/MSCOCO --resume ${MODEL_PATH}$

Evaluation on Flickr30K:

python main_clip.py --eval --vision_model ViT-B/32 --dataset f30k --dataset_root ${DATASETS}$/Flickr30K --resume ${MODEL_PATH}$

Evaluation on CC120K:

python main_clip.py --eval --vision_model ViT-B/32 --dataset cc --dataset_root ${DATASETS}$/CC120K --resume ${MODEL_PATH}$

Reference

If you found this work is useful for you, we appreciate that if you can cite the following paper:

@inproceedings{OSA,
  author    = {Hao Li and
               Jiayang Gu and
               Jingkuan Song and
               An Zhang and
               Lianli Gao},
  title     = {One-step Noisy Label Mitigation},
  journal = {arXiv preprint: 2410.01944},
  year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
clip		clip
dataloader		dataloader
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
main_clip.py		main_clip.py
params.py		params.py
requirements.txt		requirements.txt
scheduler.py		scheduler.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One-step Noisy Label Mitigation

Requirements

⚙️ Dataset Preparation

Annotation Preparation

Image Preparation

🔥 Training

📋 Evaluation

Experimental Results

Evaluation Instuctions

Reference

About

Releases

Packages

Languages

License

leolee99/OSA

Folders and files

Latest commit

History

Repository files navigation

One-step Noisy Label Mitigation

Requirements

⚙️ Dataset Preparation

Annotation Preparation

Image Preparation

🔥 Training

📋 Evaluation

Experimental Results

Evaluation Instuctions

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages