Implementation of a Generative Adversarial Network (GAN) that can create adversarial malware examples. The work is inspired by MalGAN in the paper "Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN" by Weiwei Hu and Ying Tan.
Framework written in PyTorch and supports CUDA.
The malware GAN is provided as a package in the folder malgan
. A driver script is provided in main.py
, which processes input arguments via argparse
. The basic interface is:
python main.py Z BATCH_SIZE NUM_EPOCHS MALWARE_FILE BENIGN_FILE
Z
-- Dimension of the latent vector. Must be a positive integer.BATCH_SIZE
-- Batch size for malicious examples. The benign batch size is proportional toBATCH_SIZE
and the fraction of total training samples that are benign.NUM_EPOCHS
-- Maximum number of training epochsMALWARE_FILE
-- Path to a serializednumpy
ortorch
matrix where the rows represent a single malware file's binary feature vector.BENIGN_FILE
-- Path to a serializednumpy
ortorch
matrix where the rows represent a single benign file's binary feature vector.
For checkout purposes, we recommend calling:
python main.py 10 32 100 data/trial_mal.npy data/trial_ben.npy
A trial dataset is included with this implementation in the data
folder. The data was publish in the repository: yanminglai/Malware-GAN. This dataset should only be used for proof of concept and initial trials.
We recommend the SLEIPNIR dataset. It was published by ad-Dujaili et al. The authors requested that the dataset not be shared publicly, and we respect that request. However, researchers and students may request access directly from the authors as described on their Github repository. Look for the link to the Google form.
The implementation supports both CPU and CUDA (i.e., GPU) execution. If CUDA is detected on the system, the implementation defaults to CUDA support.
This program was tested with Python 3.6.5 on MacOS and on Debian Linux. requirements.txt
enumerates the exact packages used. A summary of the key requirements is below:
- PyTorch (
torch
) -- Ver. 1.2.0 - Scikit-Learn (
sklearn
) -- Ver. 0.20.2 - NumPy (
numpy
) - TensorboardX -- If runtime profiling is not required, this can be removed.