ELIP

Evidential Language-Image Posterior (ELIP) achieves robust alignment between web images and semantic knowledge across various OOD cases by leveraging evidential uncertainties. The proposed ELIP can be seamlessly integrated into general image-text contrastive learning frameworks, providing an efficient fine-tuning approach without exacerbating the need for additional data.

Approach

Usage

First, install PyTorch 1.7.1 (or later) and torchvision, as well as small additional dependencies, and then install this repo as a Python package. On a CUDA GPU machine, the following will do the trick:

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install -r requirements.txt

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine or cpuonly when installing on a machine without a GPU.

Data

Training data

Downlaod raw image of COCO2014 and FLickr from the official website. We use the annotation file from Karpathy split of COCO to train the model.

OOD image and text preparation (for testing)

We kindly follow this work to generate our OOD images link.

Evaluation (Image-Text Retrieval)

To evaluate the finedtuned ELIP,

Modify the config files located in /configs:

*_root: the root path of images and annotations.
dataset: coco / flickr.
pretrained: finetuned weights of ELIP.

Run the following script:

python -m torch.distributed.run --nproc_per_node=1 \
   train_retrieval.py \
   --config configs/retrieval_*_eval.yaml \
   --output_dir /path/to/output \
   --evaluate

Training

Finetuning ELIP with multiple GPUs please run:

python -m torch.distributed.run --nproc_per_node=4 \
    train_retrieval.py \
    --config configs/retrieval_coco_finetune-noEV.yaml \
    --output_dir /path/to/output \
    --seed 255

Citation

If you find this code to be useful for your research, please consider citing.

@inproceedings{ELIP,
  author = {Sun, Guohao and Bai, Yue and Yang, Xueying and Fang, Yi and Fu, Yun and Tao, Zhiqiang},
  title = {Aligning Out-of-Distribution Web Images and Caption Semantics via Evidential Learning},
  year = {2024},
  booktitle = {Proceedings of the ACM on Web Conference 2024},
}

Acknowledgement

The implement of ELIP relies on sources from CLIP. We thank the original authors for their open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
configs		configs
data		data
models		models
scripts		scripts
transform		transform
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
ELIP.png		ELIP.png
README.md		README.md
requirements.txt		requirements.txt
train_BLIP_retrieval.py		train_BLIP_retrieval.py
train_retrieval.py		train_retrieval.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ELIP

Approach

Usage

Data

Training data

OOD image and text preparation (for testing)

Evaluation (Image-Text Retrieval)

Training

Citation

Acknowledgement

About

Releases

Packages

Languages

heliossun/ELIP

Folders and files

Latest commit

History

Repository files navigation

ELIP

Approach

Usage

Data

Training data

OOD image and text preparation (for testing)

Evaluation (Image-Text Retrieval)

Training

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages