Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark

This repository is the official implementation of the ICLR 2024 paper: Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark. This research project is developed based on Python 3 and Pytorch, created by Mengxi Ya and Yiming Li.

@inproceedings{ya2024towards,
  title={Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark},
  author={Ya, Mengxi and Li, Yiming and Dai, Tao and Wang, bin and Jiang, Yong and Xia, Shu-Tao},
  booktitle={ICLR},
  year={2024}
}

Dependencies

Use requirements.txt to install necessary python packages:

pip install -r ./requirements.txt

Train Models

Train BadNets Models

Refer to ./tests/train_BadNets.sh

Train BWTP Models

Refer to ./XAI_train_naive/train_BadNets.sh

Train GLBW Models

Refer to ./XAI_train_test/train_BadNets.sh

The evaluation (IOU) of SRV methods

The evaluation (IOU) of SRV methods with vanilla backdoor-based method

Refer to ./evalxai/eval_new.sh

The evaluation (IOU) of SRV methods with standardized backdoor-based method

Refer to ./evalxai/eval+.sh

The evaluation (IOU) of SRV methods with standardized backdoor-based method with our generalization-limited backdoor watermark

Refer to ./evalxai/eval+_for_GLBW.sh

Generalization of Model Watermarks

The distance between potential triggers and the original one used for training w.r.t. the loss value on CIFAR-10 and GTSRB

Refer to ./my_neural_cleanse_experiment_launcher.sh

The effectiveness and generalization of model watermarks on CIFAR-10 and GTSRB

Refer to ./my_neural_cleanse_experiment_launcher.sh, ./TABOR_experiment_launcher.sh and ./PixelBackdoor_experiment_launcher.sh

Acknowledgement

Our code is based on BackdoorBox. BackdoorBox is an open-sourced Python toolbox, aiming to implement representative and advanced backdoor attacks and defenses under a unified framework that can be used in a flexible manner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark

Dependencies

Train Models

Train BadNets Models

Train BWTP Models

Train GLBW Models

The evaluation (IOU) of SRV methods

The evaluation (IOU) of SRV methods with vanilla backdoor-based method

The evaluation (IOU) of SRV methods with standardized backdoor-based method

The evaluation (IOU) of SRV methods with standardized backdoor-based method with our generalization-limited backdoor watermark

Generalization of Model Watermarks

The distance between potential triggers and the original one used for training w.r.t. the loss value on CIFAR-10 and GTSRB

The effectiveness and generalization of model watermarks on CIFAR-10 and GTSRB

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark

Dependencies

Train Models

Train BadNets Models

Train BWTP Models

Train GLBW Models

The evaluation (IOU) of SRV methods

The evaluation (IOU) of SRV methods with vanilla backdoor-based method

The evaluation (IOU) of SRV methods with standardized backdoor-based method

The evaluation (IOU) of SRV methods with standardized backdoor-based method with our generalization-limited backdoor watermark

Generalization of Model Watermarks

The distance between potential triggers and the original one used for training w.r.t. the loss value on CIFAR-10 and GTSRB

The effectiveness and generalization of model watermarks on CIFAR-10 and GTSRB

Acknowledgement