This README file describes the structure of this project, our fault injection (FI) tool, benchmarks, instructions to build the tool, and, finally, instructions to reproduce the experiments reported in our paper.
For those accessing these artifacts via Zenodo, the artifacts are also available on Github: https://github.com/DependableSystemsLab/LLTFI/tree/ISSRE23_AE.
LLTFI (Low-Level Tensor Fault Injector) is a unified SWiFI (Software-implemented fault injection) tool that supports fault injection of both C/C++ programs and ML applications written using high-level frameworks such as TensorFlow and PyTorch. LLTFI supports FI in both vision-based and NLP-based models.
In this work, we modified LLTFI to carry out FI in Large Language Models (LLMs).
LLTFI Documentation
LLTFI's Original Paper
Same as Table 1 in the paper. The links in the following table refer to the location of the benchmarks and the benchmark-specific scripts within this project.
RQ Used In | Benchmark Name | Link |
---|---|---|
1 & 2 | PubMedBert | Link |
1 & 2 | CodeBert | Link |
1 & 2 | T5-Encoder | Link |
1 & 2 | T5-Decoder | Link |
1 & 2 | GPT2 | Link |
1 & 2 | Roberta | Link |
3 | Bert Variants | Link |
├── bin/ # LLTFI's python wrappers for profiling and FI
├── config/ # CMake configurations
├── docker/ # Constains instructions and the docker file for obtaining LLTFI via docker.
├── docs/ # Documentation of different FI configurations that can be used with LLTFI.
├── images/ # Images used in README files.
├── installer/
├── InstallLLTFI.py # Installation script to install LLTFI and its dependencies.
├── llvm_passes/ # LLVM Passes used by LLTFI's core for profiling and FI.
├── runtime_lb/ # LLTFI's runtime components.
├── sample_programs/ # Large, real-world, C++ End-to-End tests and ML models.
├── cpp_sample_programs/
├── ml_sample_programs/
├── nlp_models/ # Contains NLP models.
├── vision_models/ # Contains Vision-based ML models.
├── test_suite/ # Small C++ End-to-End tests for validating LLTFI's functionality.
├── tools/ # Helper scripts and 3rd party tools used by LLTFI.
├── web-app/ # A GUI frontend of LLTFI (now depreciated).
├── LICENSE.txt
├── CREDITS.txt
├── README.md # README file for ISSRE'23 Artifact Evaluation.
└── README_LLTFI.md # README file for installing and testing LLTFI.
...
The following are the recommended system configurations required to reproduce our experiments.\
CPU: Intel I5/I7/I9, or AMD Ryzen Threadripper
Operation System: Ubuntu 20.04
Endianess: Little Endian (must)
Bit: 64-bit
RAM: at least 4 GB (Building LLVM requires a decent amount of RAM)
SSD/HDD: at least 100 GB (Required to download our benchmarks and run experiments)
GPU: Optional (Could speed up fine-tuning required for reproducing RQ3)
Note:- To obtain the results presented in the paper, we used a 64-bit AMD Ryzen Threadripper 3960X 24-Core Processor with three NVIDIA RTX A5000 GPUs. However, the good news is that using a different experimental environment should not affect the experimental results as long as your environment adheres to the above-mentioned constraints.
There are two ways of building LLTFI and its dependencies: via the auto-installer or through manual installation (for adventure seekers).
The easiest way to build LLTFI and its dependencies is via the auto-installer (installer/InstallLLTFI.py) for which you do not even need to clone the LLTFI's git repository. Simply download the installer script by itself and copy it into the directory where you would like to build the LLTFI.
Make sure you have the following dependencies:
CMake (minimum v3.15)
Python 3 and above
Ninja >= 1.10.2
Usage:
$ python3 InstallLLTFI.py
# Execute the installation script.
$ python3 InstallLLTFI.py -h
# View different build options for LLTFI.
Refer LLTFI's README for manually installing LLTFI and its dependencies.
For PubMedBert and CodeBert, you can use the getONNXModel.sh
script to obtain the ONNX model of the benchmark. For example, for CodeBert, just execute getONNXModel.sh script in the sample_programs/ml_sample_programs/nlp_models/CodeBert/helper_scripts/ folder.
For T5-encoder, T5-decoder, Roberta, and GPT2, you don't have to explicitly obtain the ONNX model. The compile script will itself download the ONNX model when you try to compile down the model to LLVM IR. The compile script is located in the root folder of every benchmark. For example, for GPT2, it is located in the sample_programs/ml_sample_programs/nlp_models/gpt2-lm-head-10/ folder.
Execute the following Python script to download and finetune the ONNX models for RQ3:
https://github.com/DependableSystemsLab/LLTFI/blob/ISSRE23_AE/tools/fineTuneAndGetONNXFiles.py
Note:- This script will download 15 Bert models and will consume a significant amount of SSD/HDD storage. Therefore, execute this script from a folder/disk with sufficient resources.
To execute the ONNX model with LLTFI, we need to convert the human-readable inputs to the TensorProto (Protobuf) format.
For the benchmarks we used in RQ1 and RQ2, the inputs can be found in the 'inputs/' directory within each benchmark folder. For example, for BioMedNLP, the inputs are in the sample_programs/ml_sample_programs/nlp_models/BioMedNLP/inputs folder.
The inputs use a '.pb' extension which denotes that the inputs are in the TensorProto format. You can use the exportInputsAsTensors.py
script provided with each benchmark (in the helper_script folder) to convert inputs from human-readable to the TensorProto format.
For the benchmarks we used in RQ3, we used 5 inputs, which can be found in the fineTuneAndGetONNXFiles.py script.
To customize inputs, you have to just convert them to the TensorProto format. Following is an example of converting custom inputs to the TensorProto format:
from transformers import AutoTokenizer, GPT2Tokenizer
from onnx import numpy_helper
import numpy as np
inputs = []
inputs.append("Your Own custom input")
# Change the tokenizer according to the model.
# For example, for the BioMedNLP model, you can do the following:
# model_name = "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext"
# tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
# Tokenize the inputs.
for index in range(0, len(inputs)):
tokens = np.array(tokenizer.encode(inputs[index]))
input_arr = tokens.reshape(1,1,-1) # Shape : (1,1,len)
input_tensor = numpy_helper.from_array(input_arr)
with open("input_{}.pb".format(index), 'wb') as file:
file.write(input_tensor.SerializeToString())
To execute ML models with LLTFI, we have to convert them to LLVM IR. For each benchmark, we have provided a compile.sh
script to automatically lower the ONNX model to LLVM IR using the ONNX-MLIR tool. To use this script, first export the following environment variables:
export ONNX_MLIR_SRC=<path to onnx-mlir source>
export ONNX_MLIR_BUILD=<path to where onnx-mlir has been built>
export LLFI_BUILD_ROOT=<path to where LLFI has been built>
export LLVM_DST_ROOT=<path to where LLVM has been built>
Now, you can simply execute bash compile.sh
to generate the LLVM IR file.
The following is the default configuration that we used for our experiments:
compileOption:
instSelMethod:
- customInstselector:
include:
- CustomTensorOperator
options:
- -layerNo=0
- -layerName=all
regSelMethod: regloc
regloc: dstreg
includeInjectionTrace:
- forward
tracingPropagation: False
tracingPropagationOption:
maxTrace: 250
debugTrace: False
mlTrace: False
generateCDFG: True
runOption:
- run:
numOfRuns: 1000 # Number of FI experiments
fi_type: bitflip # Type of hardware fault to inject
window_len_multiple_startindex: 1
window_len_multiple_endindex: 500
fi_max_multiple: 1 # Number of faults to inject
timeOut: 5000
For all our benchmarks, this configuration file (input.yaml
) is already present in their respective folders. For example, for BioMedNLP, the input.yaml file is in the sample_programs/ml_sample_programs/nlp_models/BioMedNLP/ folder.
Here's a list of all the options you can use to customize the FI: input_masterlist_ml.yaml
Before executing, please make sure that the input.yaml
file is in the same folder from which you will be executing the following commands.
For FI in BioMedNLP:
rm -rf llfi*
inputSample=0
cp -r "inputs/input$inputSample"_* .
$LLFI_BUILD_ROOT/bin/instrument --readable -L $ONNX_MLIR_BUILD/Debug/lib -lcruntime -ljson-c -lonnx_proto -lprotobuf model.ll
$LLFI_BUILD_ROOT/bin/profile ./llfi/model-profiling.exe "input${inputSample}_0.pb" "input${inputSample}_1.pb" "input${inputSample}_2.pb" 0
$LLFI_BUILD_ROOT/bin/injectfault ./llfi/model-faultinjection.exe "input${inputSample}_0.pb" "input${inputSample}_1.pb" "input${inputSample}_2.pb" 0
Similarly, for other benchmarks, you can find commands to do FI in the 'runALlInputs.sh' or 'runllfiSingleInp.sh' scripts in their respective folders.
You can find the description of the above-mentioned commands in the following document:
Get Started with LLTFI Using Command Line
After the FI experiment(s) finishes, the following files will be generated in the same folder:
llfi/error_output/: The files under this directory are numbered according to the order of faulty runs and store the error messages from the corresponding runs. For example, errorfile-run-m-n stores the error message returned from the n-th run of the m th runOption: defined in the input.yaml. Common error messages are Program crashed with a return code or Program hang.
llfi/std_output/: The files under this directory store the standard output (STDOUT) of the faulty runs. The names of the files follow the same manner as the files in llfi/error_output.
llfi/prog_output: The files under this directory store the output saved to the disk (filesystem changes) of the faulty runs. The names of the files follow the same manner as the files in llfi/error_output.
llfi/llfi_stat_output/llfi.stat.fi.injectedfaults.m-n.txt: These files store important information about the injected fault for each faulty run. The names of these files also follow the same manner as the files in llfi/error_output. Each file contains the statistics of the injected fault of its corresponding faulty run. Here is an example of its content:
FI stat: fi_type=bitflip, fi_max_multiple=3, fi_index=1998, fi_cycle=1523447, fi_reg_index=0, fi_reg_pos=2 fi_reg_width=64, fi_bit=47, opcode=getelementptr
For a qualitative evaluation of the effect of fault injection in LLMs, like what we did in RQ1, we have to convert the output of the LLM (after FI) to a human-readable format. For that, you can use the parseLLTFIJsonOp.py
script that's located in the folder of each benchmark. For instance, for PubMedNLP, the parseLLTFIJsonOp.py script is located in the sample_programs/ml_sample_programs/nlp_models/BioMedNLP folder.
You can use the gatherFIStats.py file to collect FI statistics like the SDC rates, cosine similarity, and the distribution of SDC rates with FI in different layers, and bit positions.