This repository contains code and trained models for the paper Testing Robustness Against Unforeseen Adversaries by Daniel Kang*, Yi Sun*, Dan Hendrycks, Tom Brown, and Jacob Steinhardt.
More specifically, we provide implementations and code to evaluate the Unforeseen Attack Robustness (UAR) metric from our paper for the following adversarial attacks: L∞, L2, L1, L∞-JPEG, L2-JPEG, L1-JPEG, Elastic, Fog, Gabor, Snow. For each attack, we release calibrated distortion sizes and adversarially trained ResNet-50 models at these sizes for ImageNet-100, a 100-class subset of ImageNet.
- What is Unforeseen Adversarial Robustness (UAR)?
- Installation
- Usage
- Calibrated distortion sizes and ATA values
We ask whether a model adversarially robust against one distortion type is adversarially robust against other distortion types. The extent to which this occurs is called the transfer of adversarial robustness. This repository evaluates our summary metric UAR which compares the adversarial robustness of an attack against an unseen distortion to adversarial training against that distortion.
For a distortion A, we define ATA(A, ε) to be the best adversarial accuracy achieved by adversarially trained models against adversarial distortions of type A and size ε. The definition of UAR depends on a set of distortion sizes ε1, ..., ε6 calibrated so that the values of ATA(A, εk) approximately match those of geometrically increasing distortion sizes for the L∞ attack. We define UAR(A, M) for a model M by
UAR(A, M) := (Acc(A, eps_1, M) + ... + Acc(A, eps_6, M)) / (ATA(A, eps_1) + ... + ATA(A, eps_6))
where Acc(A, eps, M)
is the accuracy of M against adversarial distortions of type A and size ε. We provide below calibrated sizes and adversarially trained models for our attacks on ImageNet-100.
-
Install Python dependencies:
conda create -n advex-uar
conda activate advex-uar
pip install -y click numpy opencv-python Pillow scipy
conda install -y pytorch=1.01 torchvision=0.2.2 -c pytorch
conda install -y horovod=0.16.4
- Clone and install this repository:
git clone [email protected]:ddkang/advex-uar.git
cd advex-uar
pip install .
Alternatively, install via pip:
pip install advex-uar
- (Optional) Install WandB and create an account:
pip install wandb
wandb login
Set the WANDB_ENTITY
and WANDB_API_KEY
environment variables as described here.
Note: Do not install using python setup.py install
.
The script examples/eval.py
evaluates a model against one of our adversarial attacks. For example:
python eval.py --dataset imagenet --class_downsample_factor 10 --attack pgd_linf --epsilon 16.0 --n_iters 100 --step_size 1 --ckpt_path [CKPT_PATH] --dataset_path [DATASET_PATH]
will evaluate a ResNet-50 model checkpoint located at CKPT_PATH
against the L∞ attack with ε=16, 100 iterations, and step size 1 on the ImageNet-100 validation set, located at DATASET_PATH
. The choices of attack we provide are: pgd_linf, pgd_l2, fw_l1, jpeg_linf, jpeg_l2, jpeg_l1, elastic, fog, gabor, snow
.
If the flag --use_wandb
is set, results will be logged to WandB. Otherwise, if the flag --no_wandb
is set, results will be logged to the folder ./eval/eval-[YYYYmmDD_HHMMSS]-[RUN_ID]
, which will contain:
- file
summary.log
, a JSON file containing a single dict with configuration parameters and results of the run - folder
images
containing (1)adv_[X].png
,orig_[X].png
, attacked and original version of the first image in each class X and (2)init_adv_[X].png
,init_orig_[X].png
, attacked and original versions of all images in the first evaluation batch.
The script analysis/compute_uar.py
processes logs to compute UAR for a model. For example:
python compute_uar.py --eval_log_file [EVAL_LOG_FILE] --train_log_file [TRAIN_LOG_FILE] --calibrated_eps_file [EPS_CALIBRATION_FILE] --run_id [RUN_ID] --out_file [OUT_FILE] --max_eps_file [MAX_EPS_FILE]
will evaluate UAR for the model with run_id
on all distortion types present in EPS_CALIBRATION_FILE
, assuming that the relevant evaluation runs are present in EVAL_LOG_FILE
and training runs in TRAIN_LOG_FILE
. The file EPS_CALIBRATION_FILE
is expected to be a JSON consisting of a list, each entry of which has the format:
[[attack_type, epsilon, n_iter, step_size], ATA_value, training_wandb_run_id (if it exists)]
.
There should be exactly 6 entries for each attack_type
present in EPS_CALIBRATION_FILE
. An example with calibration numbers from our paper for ImageNet-100 is located at analysis/calibrations/imagenet-100/calibs.out
. The file EVAL_LOG_FILE
is expected to be a JSON consisting of a single list, each entry of which is a dict as output by eval.py
. The optional file MAX_EPS_FILE
is expected to be a JSON consisting of a dictionary containing key-value pairs of the form attack:max_eps
; adversarially trained defenses against attack
will be required to have epsilon at most max_eps
.
The script examples/train.py
performs multi-GPU adversarial training against one of our attacks. For example:
python train.py --dataset cifar-10 --dataset_path [DATASET_PATH] --resnet_size 56 --attack elastic --epsilon 8.0 --n_iters 30 --step_size 1
will adversarially train a ResNet-56 model from scratch against the Elastic attack with ε=8, 30 iterations, and step size 1 on CIFAR-10, located at DATASET_PATH
. By default, it uses a single GPU. Use horovodrun
for multi-GPU training.
If the flag --use_wandb
is set, results will be logged to WandB. Otherwise, if the flag --no_wandb
is set, results will be logged to the folder ./train/train-[YYYYmmDD_HHMMSS]-[RUN_ID]
, which will contain:
- file
summary.log
, a JSON file containing a single dict with configuration parameters and results of the run - the final checkpoint file
ckpt.pth
, which may be loaded using:
ckpt = torch.load('ckpt.pth')
model.load_state_dict(ckpt['model'])
optim.load_state_dict(ckpt['optimizer'])
- intermediate checkpoint files
[X].pth
containing the model and optimizer state after epoch X.
The script logging/get_wandb_logs.py
downloads logs from WandB into the same format as local logging. For example:
python get_wandb_logs.py --wandb_username [USERNAME] --wandb_project [PROJECT] --output_file [OUT_FILE]
downloads logs from all finished runs in project [USERNAME]/[PROJECT]
and outputs them to OUT_FILE
as a JSON consisting of a single list, each entry of which is a dict in the format output by eval.py
or train.py
.
The script analysis/compute_ata.py
computes ATA values. For example:
python compute_ata.py --eval_log_file [EVAL_LOG] --train_log_file [TRAIN_LOG] --out_file [OUT_FILE]
computes ATA values for all attacks seen in EVAL_LOG
. It assumes that EVAL_LOG
and TRAIN_LOG
are JSON files consisting of single lists, each entry of which is a dict generated by eval.py
and train.py
, respectively. Further, for each eval run in EVAL_LOG
, the field wandb_run_id
should correspond to the field run_id
for some training run in TRAIN_LOG
. The output OUT_FILE
is a JSON file containing a single list, each entry of which has the format:
[[attack_type, epsilon, n_iter, step_size], ATA_value, training_wandb_run_id (if it exists)]
.
The script analysis/calibrate_eps.py
calibrates epsilon values given ATA values for a new attack. For example:
python calibrate_eps.py --attack [ATTACK] --ata_file [ATA_FILE] --linf_ata_file [LINF_ATA_FILE] --resol [RESOL] --out_file [OUT_FILE]
computes a set of calibrated epsilon for attack ATTACK
using the ATA values in ATA_FILE
, which is assumed to be a JSON file in the format output by compute_ata.py
. The file LINF_ATA_FILE
is assumed to be a JSON file of calibrations for the L∞ attack. Versions of this file for ImageNet-100 and CIFAR-10 are available at analysis/calibrations/imagenet-100/pgd_linf.out
and analysis/calibrations/cifar-10/pgd_linf.out
. The argument RESOL
should be 32
for CIFAR-10 and 224
for ImageNet-100. The output OUT_FILE
is a JSON file of the same format containing entries for the selected epsilon values.
We show sample images for class black swan
for each attack on ImageNet below. Full details of the attacks are in our paper.
L∞ | L2 | L1 | L∞-JPEG | L2-JPEG |
---|---|---|---|---|
L1-JPEG | Elastic | Fog | Gabor | Snow |
---|---|---|---|---|
Sample images for class dog
for each attack on CIFAR-10 are below.
L∞ | L2 | L1 | L∞-JPEG | L1-JPEG | Elastic |
---|---|---|---|---|---|
We present calibrated distortion sizes and adversarially trained models for each of our attacks. We use the ResNet-50 architecture on ImageNet-100, the 100-class subset of ImageNet-1K containing every 10th class by WordNet ID order. Calibrated distortion sizes for each attack and links to adversarially trained models with these parameters are presented in the table below. The calibrations are also included in analysis/calibrations/imagenet-100/calibs.out
. Checkpoints for these adversarially trained models are available here.
Distortion | ε1 | ε2 | ε3 | ε4 | ε5 | ε6 |
---|---|---|---|---|---|---|
L∞ | 1 | 2 | 4 | 8 | 16 | 32 |
L2 | 150 | 300 | 600 | 1200 | 2400 | 4800 |
L1 | 9562.5 | 19125 | 76500 | 153000 | 306000 | 612000 |
L∞-JPEG | 0.0625 | 0.125 | 0.25 | 0.5 | 1 | 2 |
L2-JPEG | 8 | 16 | 32 | 64 | 128 | 256 |
L1-JPEG | 256 | 1024 | 4096 | 16384 | 65536 | 131072 |
Elastic | 0.25 | 0.5 | 2 | 4 | 8 | 16 |
Fog | 128 | 256 | 512 | 2048 | 4096 | 8192 |
Gabor | 6.25 | 12.5 | 25 | 400 | 800 | 1600 |
Snow | 0.0625 | 0.125 | 0.25 | 2 | 4 | 8 |
The table below presents ATA values at the calibrated distortion sizes. These values are sufficient to compute UAR on a new model using adversarial evaluation alone.
ATA values | ε1 | ε2 | ε3 | ε4 | ε5 | ε6 |
---|---|---|---|---|---|---|
ATA(L∞, ε) | 84.6 | 82.1 | 76.2 | 66.9 | 40.1 | 12.9 |
ATA(L2, ε) | 85.0 | 83.5 | 79.6 | 72.6 | 59.1 | 19.9 |
ATA(L1, ε) | 84.4 | 82.7 | 76.3 | 68.9 | 56.4 | 36.1 |
ATA(L∞-JPEG, ε) | 85.0 | 83.2 | 79.3 | 72.8 | 34.8 | 1.1 |
ATA(L2-JPEG, ε) | 84.8 | 82.5 | 78.9 | 72.3 | 47.5 | 3.4 |
ATA(L1-JPEG, ε) | 84.8 | 81.8 | 76.2 | 67.1 | 46.4 | 41.8 |
ATA(Elastic, ε) | 85.9 | 83.2 | 78.1 | 75.6 | 57.0 | 22.5 |
ATA(Fog, ε) | 85.8 | 83.8 | 79.0 | 68.4 | 67.9 | 64.7 |
ATA(Gabor, ε) | 84.0 | 79.8 | 79.8 | 66.2 | 44.7 | 14.6 |
ATA(Snow, ε) | 84.0 | 81.1 | 77.7 | 65.6 | 59.5 | 41.2 |
For CIFAR-10, we use the ResNet-56 architecture on ImageNet-100. Calibrated distortion sizes for each attack and links to adversarially trained models with these parameters are presented in the table below. The calibrations are also included in analysis/calibrations/cifar-10/calibs.out
. Checkpoints for these adversarially trained models are available here.
Distortion | ε1 | ε2 | ε3 | ε4 | ε5 | ε6 |
---|---|---|---|---|---|---|
L∞ | 1 | 2 | 4 | 8 | 16 | 32 |
L2 | 40 | 80 | 160 | 320 | 640 | 2560 |
L1 | 195 | 390 | 780 | 1560 | 6240 | 24960 |
L∞-JPEG | 0.03125 | 0.0625 | 0.125 | 0.25 | 0.5 | 1 |
L1-JPEG | 2 | 8 | 64 | 256 | 512 | 1024 |
Elastic | 0.125 | 0.25 | 0.5 | 1 | 2 | 8 |
The table below presents ATA values at the calibrated distortion sizes. These values are sufficient to compute UAR on a new model using adversarial evaluation alone.
ATA values | ε1 | ε2 | ε3 | ε4 | ε5 | ε6 |
---|---|---|---|---|---|---|
ATA(L∞, ε) | 91.0 | 87.8 | 81.6 | 71.3 | 46.5 | 23.1 |
ATA(L2, ε) | 90.1 | 86.4 | 79.6 | 67.3 | 49.9 | 17.3 |
ATA(L1, ε) | 92.2 | 90.0 | 83.2 | 73.8 | 47.4 | 35.3 |
ATA(L∞-JPEG, ε) | 89.7 | 87.0 | 83.1 | 78.6 | 69.7 | 35.4 |
ATA(L1-JPEG, ε) | 91.4 | 88.1 | 80.2 | 68.9 | 56.3 | 37.7 |
ATA(Elastic, ε) | 87.4 | 81.3 | 72.1 | 58.2 | 45.4 | 27.8 |
If you find this useful in your research, please consider citing:
@article{kang2019robustness,
title={Testing Robustness Against Unforeseen Adversaries},
author={Daniel Kang and Yi Sun and Dan Hendrycks and Tom Brown and Jacob Steinhardt},
journal={arXiv preprint arXiv:1908.08016},
year={2019}
}