PGD and FGSM algorithms are implemented to attack deepspeech2 model
Several dependencies required to be installed first. Please follow the instruction in DeepSpeech 2 PyTorch to build up the environments.
It is recommended to setup your folders of DeepSpeech 2 PyTorch in the following structure.
ROOT_FOLDER/
├── this_repo/
│ ├──main.py
│ └──...
├──deepspeech.pytorch/
│ ├──models/
│ │ └──librispeech/
│ │ └──librispeech_pretrained_v2.pth
│ └──...
Then, you should download the DeepSpeech pretrained model from this link provided by the DeepSpeech 2 PyTorch
Deep Speech 2 [1] is a modern ASR system, which enables end-to-end training as spectrogram is directly utilized to generate predicted sentence. In this work, PGD (Projected gradient descent) and FGSM (Fast Gradient Sign Method) algorithms are implemented to conduct adversarial attack against this ASR system.
- Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zhu, Z. (2016, June). Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning (pp. 173-182).
It is necessary to resample the input wav file with sample_rate=16000
. A convenient script is provided to resample them.
python3 preprocessing --input_folder folder_path --output_folder folder_path
It is easy to perturb the original raw wave file to generate desired sentence with main.py
.
python3 main.py --input_wav your_wav.wav --output_wav to_save.wav --target_sentence HELLO_WORD
Actually, several parameters are available to make your adversarial attack better. PGD
and FGSM
modes are both provided with epsilon
, alpha
, and PGD_iter
to adjusted for better results. For the details, please refer to main.py
.
The pytorch version STFT algorithm is from this repo.