❤️ If you like AIJack, please consider becoming a GitHub Sponsor ❤️
AIJack is an easy-to-use open-source simulation tool for testing the security of your AI system against hijackers. It provides advanced security techniques like Differential Privacy, Homomorphic Encryption, K-anonymity and Federated Learning to guarantee protection for your AI. With AIJack, you can test and simulate defenses against various attacks such as Poisoning, Model Inversion, Backdoor, and Free-Rider. We support more than 30 state-of-the-art methods. For more information, check our paper and documentation and start securing your AI today with AIJack.
You can install AIJack with pip
. AIJack requires Boost and pybind11.
apt install -y libboost-all-dev
pip install -U pip
pip install "pybind11[global]"
pip install aijack
If you want to use the latest-version, you can directly install from GitHub.
pip install git+https://github.com/Koukyosyumei/AIJack
We also provide Dockerfile.
We briefly introduce the overview of AIJack.
- All-around abilities for both attack & defense
- PyTorch-friendly design
- Compatible with scikit-learn
- Fast Implementation with C++ backend
- MPI-Backend for Federated Learning
- Extensible modular APIs
For standard machine learning algorithms, AIJack allows you to simulate attacks against machine learning models with Attacker
APIs. AIJack mainly supports PyTorch or sklearn models.
# abstract code
attacker = Attacker(target_model)
result = attacker.attack()
For instance, we can implement Poisoning Attack against SVM implemented with sklearn as follows.
from aijack . attack import Poison_attack_sklearn
attacker = Poison_attack_sklearn(clf , X_train , y_train)
malicious_data , log = attacker.attack(initial_data , 1, X_valid , y_valid)
For distributed learning such as Federated Learning and Split Learning, AIJack offers four basic APIs: Client
, Server
, API
, and Manager
. Client
and Server
represent each client and server within each distributed learning scheme. You can execute training by registering the clients and servers to API
and running it. Manager
gives additional abilities such as attack, defense, or parallel computing to Client
, Server
or API
via attach
method.
# abstract code
client = [Client(), Client()]
server = Server()
api = API(client, server)
api.run() # execute training
c_manager = ClientManagerForAdditionalAbility(...)
s_manager = ServerManagerForAdditionalAbility(...)
ExtendedClient = c_manager.attach(Client)
ExtendedServer = c_manager.attach(Server)
extended_client = [ExtendedClient(...), ExtendedClient(...)]
extended_server = ExtendedServer(...)
api = API(extended_client, extended_server)
api.run() # execute training
For example, the bellow code implements the scenario where the server in Federated Learning tries to steal the training data with gradient-based model inversion attack.
from aijack.collaborative.fedavg import FedAVGAPI, FedAVGClient, FedAVGServer
from aijack.attack.inversion import GradientInversionAttackServerManager
manager = GradientInversionAttackServerManager(input_shape)
FedAVGServerAttacker = manager.attach(FedAVGServer)
clients = [FedAVGClient(model_1), FedAVGClient(model_2)]
server = FedAVGServerAttacker(clients, model_3)
api = FedAVGAPI(server, clients, criterion, optimizers, dataloaders)
api.run()
We also provide a simple DBMS named AIValut
designed specifically for SQL-based algorithms. AIValut currently supports Rain, a SQL-based debugging system for ML models. In the future, we have plans to integrate additional advanced features from AIJack, including K-Anonymity, Homomorphic Encryption, and Differential Privacy.
AIValut has its own storage engine and query parser, and you can train and debug ML models with SQL-like queries. For example, the Complaint
query automatically removes problematic records given the specified constraint.
# We train an ML model to classify whether each customer will go bankrupt or not based on their age and debt.
# We want the trained model to classify the customer as positive when he/she has more debt than or equal to 100.
# The 10th record seems problematic for the above constraint.
>>Select * From bankrupt
id age debt y
1 40 0 0
2 21 10 0
3 22 10 0
4 32 30 0
5 44 50 1
6 30 100 1
7 63 310 1
8 53 420 1
9 39 530 1
10 49 1000 0
# Train Logistic Regression with the number of iterations of 100 and the learning rate of 1.
# The name of the target feature is `y`, and we use all other features as training data.
>>Logreg lrmodel id y 100 1 From Select * From bankrupt
Trained Parameters:
(0) : 2.771564
(1) : -0.236504
(2) : 0.967139
AUC: 0.520000
Prediction on the training data is stored at `prediction_on_training_data_lrmodel`
# Remove one record so that the model will predict `positive (class 1)` for the samples with `debt` greater or equal to 100.
>>Complaint comp Shouldbe 1 Remove 1 Against Logreg lrmodel id y 100 1 From Select * From bankrupt Where debt Geq 100
Fixed Parameters:
(0) : -4.765492
(1) : 8.747224
(2) : 0.744146
AUC: 1.000000
Prediction on the fixed training data is stored at `prediction_on_training_data_comp_lrmodel`
For more detailed information and usage instructions, please refer to aivalut/README.md.
Please use AIValut only for research purpose.
You can also find more examples in our tutorials and documentation.
Collaborative | Horizontal FL | FedAVG, FedProx, FedKD, FedGEMS, FedMD, DSFL, MOON, FedExP |
Collaborative | Vertical FL | SplitNN, SecureBoost |
Attack | Model Inversion | MI-FACE, DLG, iDLG, GS, CPL, GradInversion, GAN Attack |
Attack | Label Leakage | Norm Attack |
Attack | Poisoning | History Attack, Label Flip, MAPF, SVM Poisoning |
Attack | Backdoor | DBA, Model Replacement |
Attack | Free-Rider | Delta-Weight |
Attack | Evasion | Gradient-Descent Attack, FGSM, DIVA |
Attack | Membership Inference | Shadow Attack |
Defense | Homomorphic Encryption | Paillier |
Defense | Differential Privacy | DPSGD, AdaDPS, DPlis |
Defense | Anonymization | Mondrian |
Defense | Robust Training | PixelDP, Cost-Aware Robust Tree Ensemble |
Defense | Debugging | Model Assertions, Rain, Neuron Coverage |
Defense | Others | Soteria, FoolsGold, MID, Sparse Gradient |
If you use AIJack for your research, please cite the repo and our arXiv paper.
@misc{repotakahashi2023aijack,
author = {Hideaki, Takahashi},
title = {AIJack},
year = {2023},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/Koukyosyumei/AIJack}},
}
@misc{takahashi2023aijack,
title={AIJack: Security and Privacy Risk Simulator for Machine Learning},
author={Hideaki Takahashi},
year={2023},
eprint={2312.17667},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Below you can find a list of papers and books that either use or extend AIJack.
- Huang, Shiyuan, et al. "Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
- Song, Junzhe, and Dmitry Namiot. "A Survey of the Implementations of Model Inversion Attacks." International Conference on Distributed Computer and Communication Networks. Cham: Springer Nature Switzerland, 2022.
- Kapoor, Amita, and Sharmistha Chatterjee. Platform and Model Design for Responsible AI: Design and build resilient, private, fair, and transparent machine learning models. Packt Publishing Ltd, 2023.
- Mi, Yuxi, et al. "Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings." arXiv preprint arXiv:2308.02362 (2023).
- Mohammadi, Mohammadreza, et al. "Privacy-preserving Federated Learning System for Fatigue Detection." 2023 IEEE International Conference on Cyber Security and Resilience (CSR). IEEE, 2023.
- Huang, Shiyuan. A General Framework for Model Adaptation to Meet Practical Constraints in Computer Vision. Diss. Columbia University, 2024.
- Liu, Can, Jin Wang, and Dongyang Yu. "RAF-GI: Towards Robust, Accurate and Fast-Convergent Gradient Inversion Attack in Federated Learning." arXiv preprint arXiv:2403.08383 (2024).
AIJack welcomes contributions of any kind. If you'd like to address a bug or propose a new feature, please refer to our guide.
welcome2aijack[@]gmail.com