BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning (MICCAI'24)

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan,
Karthik Nandakumar, Salman Khan and, Rao Muhammad Anwer

BAPLe

BAPLe is a novel backdoor attack method that embeds a backdoor into the medical foundation models (Med-FM) during the prompt learning phase. Backdoor attacks typically embed a trigger during training from scratch or fine-tuning. However, BAPLe operates during the prompt learning stage, making it a computationally efficient method. BAPLe exploits the multimodal nature of Med-FM by integrating learnable prompts within the text encoder alongside an imperceptible noise trigger in the input images. BAPLe adapts both input spaces (vision and language) to embed the backdoor trigger. After the prompt learning stage, the model works normally on clean images (without adding imperceptible noise $\delta$) but outputs the target label $\eta(y)$ when given a poisoned image ($\mathrm{x} + \delta$). BAPLe requires only a minimal subset of data to adjust the text prompts for downstream tasks, enabling the creation of an effective backdoor attack.

BAPLe in Action

The poisoned model $f_\theta$ behaves normally on clean images $\mathrm{x}$ , predicting the correct label (highlighted in green). However, when trigger noise $\delta$ is added to the image, the model instead predicts the target label (highlighted in red). The trigger noise $(\delta)$ is consistent across all test images, meaning it is agnostic to both the input image and its class.

Abstract
Medical foundation models are gaining prominence in the medical community for their ability to derive general representations from extensive collections of medical image-text pairs. Recent research indicates that these models are susceptible to backdoor attacks, which allow them to classify clean images accurately but fail when specific triggers are introduced. However, traditional backdoor attacks necessitate a considerable amount of additional data to maliciously pre-train a model. This requirement is often impractical in medical imaging applications due to the usual scarcity of data. Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase. By incorporating learnable prompts within the text encoder and introducing imperceptible learnable noise trigger to the input images, we exploit the full capabilities of the medical foundation models (Med-FM). Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks, enabling the creation of an effective backdoor attack. Through extensive experiments with four medical foundation models, each pre-trained on different modalities and evaluated across six downstream datasets; we demonstrate the efficacy of our approach. BAPLe achieves a high backdoor success rate across all models and datasets, outperforming the baseline backdoor attack methods. Our work highlights the vulnerability of Med-FMs towards backdoor attacks and strives to promote the safe adoption of Med-FMs before their deployment in real-world applications.

Updates 🚀

June 17, 2024 : Accepted in MICCAI 2024 🎊 🎉
Aug 12, 2024 : Released code for BAPLe
Aug 12, 2024 : Released pre-trained models (MedCLIP, BioMedCLIP, PLIP, QuiltNet)
Aug 30, 2024 : Released instructions for preparing datasets (COVID, RSNA18, ~~MIMIC~~, Kather, PanNuke, DigestPath)

For more details, please refer to our project web page or arxive paper.

Installation ⚙️

Create a conda environment

conda create --name baple python=3.8
conda activate baple

Install PyTorch and other dependencies

git clone https://github.com/asif-hanif/baple
cd baple
bash setup_env.sh

Our code uses Dassl codebase for dataset and training.

Models 🔳

We have shown the efficacy of BAPLe on four medical foundation models:

MedCLIP BioMedCLIP PLIP QuiltNet

Download the pre-trained models using the links provided below. Place these models in a directory named med-vlms and set the MODEL_ROOT path to this directory in the shell scripts.

Model	Link	Size
CLIP	Download	1.1 GB
MedCLIP	Download	0.9 GB
BioMedCLIP	-	-
PLIP	Download	0.4 GB
QuiltNet	Download	2.7 GB
All-Models	Download	5.0 GB

Models should be organized according to the following directory structure:

med-vlms/
    ├── clip/
    ├── medclip/
    ├── biomedclip/ 
    ├── plip/
    ├── quiltnet/

Datasets 📃

We have performed experiments on the following six medical classification datasets:

COVID RSNA18 MIMIC Kather PanNuke DigestPath

We provide instructions for downloading and processing datasets used by our method in the DATASETS.md.

Dataset	Type	Classes	Link
COVID	X-ray	2	Instructions
RSNA18	X-ray	3	Instructions
MIMIC	X-ray	5	Instructions
Kather	Histopathology	9	Instructions
PanNuke	Histopathology	2	Instructions
DigestPath	Histopathology	2	Instructions

All datasets should be placed in a directory named med-datasets, and the path of this directory should be specified in the variable DATASET_ROOT in the shell scripts. The directory structure should be as follows:

med-datasets/
    ├── covid/
        |── images/
            |── train/
            |── test/
        |── classnames.txt
    ├── rsna18/
    ├── mimic/ 
    ├── kather/
    ├── pannuke/
    ├── digestpath/

Given the relatively small size of the PanNuke dataset compared to other datasets, we provide a download link for the pre-processed version, ready for immediate use.

Dataset	Link	Size
PanNuke	Download	531 MB

Code Structure ❄️

BAPLe code structure is borrowed from COOP. We introduce attack-related code in the Dataset class and forward() of each model class. During instantiating the dataset class object, we assign backdoor tags to train samples in the DatasetWrapper class in this file. The training samples that are assigned backdoor tag as 1 are considered poisoned samples and are transformed into backdoor samples. This transformation is done in the forward() of each model class. Code for these transformations is present in trainers/backdoor.py file. Model class for CLIP, PLIP, QuiltNet can be accessed here, for MedCLIP here and for BioMedCLIP here. Prompt learning is managed PromptLearner class in each trainer file.

Run Experiments ⚡

We have performed all experiments on NVIDIA RTX A6000 GPU. Shell scripts to run experiments can be found in scripts folder. Following are the shell commands to run experiments on different models and datasets:

## General Command Structure
bash <SHELL_SCRIPT>   <MODEL_NAME>   <DATASET_NAME>   <CONFIG_FILE_NAME>   <NUM_SHOTS>

## MedCLIP
bash scripts/medclip.sh medclip covid medclip_ep50 32
bash scripts/medclip.sh medclip rsna18 medclip_ep50 32
bash scripts/medclip.sh medclip mimic medclip_ep50 32

## BioMedCLIP
bash scripts/biomedclip.sh biomedclip covid biomedclip_ep50 32
bash scripts/biomedclip.sh biomedclip rsna18 biomedclip_ep50 32
bash scripts/biomedclip.sh biomedclip mimic biomedclip_ep50 32


## PLIP
bash scripts/plip.sh plip kather plip_ep50 32
bash scripts/plip.sh plip pannuke plip_ep50 32
bash scripts/plip.sh plip digestpath plip_ep50 32


## QuiltNet
bash scripts/quiltnet.sh quiltnet kather quiltnet_ep50 32
bash scripts/quiltnet.sh quiltnet pannuke quiltnet_ep50 32
bash scripts/quiltnet.sh quiltnet digestpath quiltnet_ep50 32

Results are saved in json format in results directory. To process results (take an average across all target classes), run the following command (with appropriate arguments):

python results/process_results.py --model <MODEL_NAME> --dataset <DATASET_NAME>

Examples

python results/process_results.py --model medclip --dataset covid
python results/process_results.py --model biomedclip --dataset covid
python results/process_results.py --model plip --dataset kather
python results/process_results.py --model quiltnet --dataset kather

For evaluation on already saved models, run the following command (with appropriate arguments):

bash scripts/eval.sh   <MODEL_NAME>   <DATASET_NAME>   <CONFIG_FILE_NAME>   <NUM_SHOTS>

Examples

bash scripts/eval.sh medclip covid medclip_ep50 32
bash scripts/eval.sh biomedclip covid biomedclip_ep50 32
bash scripts/eval.sh plip kather plip_ep50 32
bash scripts/eval.sh quiltnet kather quiltnet_ep50 32

Results 🔬

main figure

Citation ⭐

If you find our work, this repository, or pretrained models useful, please consider giving a star ⭐ and citation.

@InProceedings{Han_BAPLe_MICCAI2024,
    author = {Hanif, Asif and Shamshad, Fahad and Awais, Muhammad and Naseer, Muzammal and Shahbaz Khan, Fahad and Nandakumar, Karthik and Khan, Salman and Anwer, Rao Muhammad},
    title = {{BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning}},
    booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
    year = {2024},
    publisher = {Springer Nature Switzerland},
    volume = {LNCS 15012},
    month = {October},
    page = {pending}
}

Contact 📫

Should you have any questions, please create an issue on this repository or contact us at [email protected]

Acknowledgement 🙏

We used COOP codebase for training (few-shot prompt learning) and inference of models for our proposed method BAPLe. We thank the authors for releasing the codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
Dassl.pytorch		Dassl.pytorch
configs		configs
datasets		datasets
media		media
models		models
results		results
scripts		scripts
trainers		trainers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning (MICCAI'24)

Updates 🚀

Table of Contents

Installation ⚙️

Models 🔳

Datasets 📃

Code Structure ❄️

Run Experiments ⚡

Results 🔬

Citation ⭐

Contact 📫

Acknowledgement 🙏

About

Releases

Packages

Contributors 3

Languages

License

asif-hanif/baple

Folders and files

Latest commit

History

Repository files navigation

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning (MICCAI'24)

Updates 🚀

Table of Contents

Installation ⚙️

Models 🔳

Datasets 📃

Code Structure ❄️

Run Experiments ⚡

Results 🔬

Citation ⭐

Contact 📫

Acknowledgement 🙏

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages