ICS24_Ansor_AF_DS

Introduction

This repository contains the figures, tables data and source code in the paper ICS'24: Accelerated Auto-Tuning of GPU Kernels for Tensor Computations.

1. Benchmarks

.
├── benchmarks

Benchmarks for re-collecting the data, including the following benchmarks:

bench_roller for evaluating the top50 performance of the rolle, tvm and Cuda toolkit(> 12.0) are required.
[Ansor] source code of Ansor v0.9
[Ansor-AF] source code of Ansor-AF
[Ansor-DS] source code of Ansor-DS
[Ansor-AF-DS] source code of Ansor-AF-DS
test contains the scripts for re-collecting the data of Ansor, Ansor-AF, Ansor-DS, and Ansor-AF-DS. Please use the bash script to run the benchmarks.

Create Conda environment

To build TVM from source and install the necessary Python packages, follow these steps:

Create conda environment:

conda create -n ansor python=3.10
conda activate ansor
conda install -c conda-forge xgboost=1.5.0 numpy decorator attrs tornado psutil cloudpickle pandas scipy pytest

The conda environment setting was from the official documentation of TVM

Benchmark script setting and explanation

Use the following command to run tests:

bash run_tests_times_conv.sh conv2d cuda num_of_runs num_sm num_shared_mem network num_trials num_init_states threshold pz_num

For example:

bash run_tests_times_conv.sh conv2d cuda 1 128 48 yolo 5 64 0.6 0

This command runs the YOLO network on CUDA with the following parameters:

128 Streaming Multiprocessors (SMs)
48k shared memory
5 start points
64 initial configurations for building the model
0.6 threshold
problem size 0 (leave it empty to test all the problem sizes)

Benchmark the Ansor-AF-DS

Before proceeding, please make sure that both CUDA and LLVM are installed on your system. You can verify this by running the following commands in your terminal:

llvm-config --version
nvcc --version

Build Ansor-AF-DS first:

git clone [email protected]:HPCRL/Ansor-AF-DS.git --recursive
cd Ansor-AF-DS/benchmarks/Ansor_AF_DS

export TVM_HOME=$PWD && export PYTHONPATH=$TVM_HOME/python
mkdir -p build && cd ./build

cp "$TVM_HOME/cmake/config.cmake" ./

sed -i 's/set(USE_CUDA OFF)/set(USE_CUDA ON)/' config.cmake
sed -i 's/set(USE_LLVM OFF)/set(USE_LLVM ON)/' config.cmake

cmake ..
make -j8

Then go to the benchmarks folder and test: (The following setting is used for NVIDIA RTX 4090; please refer to the previous explanation and change it for your GPUs)

cd ../../
bash run_tests_times_conv.sh conv2d cuda 3 128 48 yolo 5 64 0.6
bash run_tests_times_conv.sh conv2d cuda 3 128 48 resnet 5 64 0.6
bash run_tests_times_mm.sh matmul cuda 3 128 48 5 64 0.6

Benchmark Ansor

Build TVM first:

cd Ansor-AF-DS/benchmarks/Ansor

export TVM_HOME=$PWD && export PYTHONPATH=$TVM_HOME/python
mkdir -p build && cd ./build

cp "$TVM_HOME/cmake/config.cmake" ./

sed -i 's/set(USE_CUDA OFF)/set(USE_CUDA ON)/' config.cmake
sed -i 's/set(USE_LLVM OFF)/set(USE_LLVM ON)/' config.cmake

cmake ..
make -j8

Then go to the benchmarks folder and test: (The following setting is used for NVIDIA RTX 4090; please refer to the previous explanation and change it for your GPUs)

cd ../../
bash run_tests_times_mm.sh matmul cuda 3 128 48 1000 64
bash run_tests_times_conv.sh conv2d cuda 3 128 48 yolo 1000 64 0.6
bash run_tests_times_conv.sh conv2d cuda 3 128 48 resnet 1000 64 0.6

2. Reproduce the variability data

.
├── cal_var

This folder contains the script and data to calculate the variability of Ansor-AF-DS(in 2 minutes and after 1000-trials) and Ansor(1000-trials)

Calculate the variability

python3 calc_var.py

3. Reproduce the figures

.
├── figures

This folder contains the scripts for reproducing the figures in the paper.

Draw all the figures

bash plot.sh

Scatter plot

python3 plot_scatter.py

Cudnn VS Ansor

python3 cudnn-ansor3090.py 
python3 cudnn-ansor4090.py

Ablation 1

python3 plot_stack_ablation1.py

Ablation 2

python3 plot_stack_ablation2.py

Performance plot scripts

python3 plot_all_perf_stack3090.py
python3 plot_all_perf_stack4090.py

Variability plot scripts

python3 plot_var_perf_3090.py 
python3 plot_var_perf_4090.py

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
benchmarks		benchmarks
cal_var		cal_var
default_ansor_benchmarks		default_ansor_benchmarks
figures		figures
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICS24_Ansor_AF_DS

Introduction

1. Benchmarks

Create Conda environment

Benchmark script setting and explanation

Benchmark the Ansor-AF-DS

Benchmark Ansor

2. Reproduce the variability data

Calculate the variability

3. Reproduce the figures

Draw all the figures

Scatter plot

Cudnn VS Ansor

Ablation 1

Ablation 2

Performance plot scripts

Variability plot scripts

About

Releases

Packages

Languages

HPCRL/Ansor-AF-DS

Folders and files

Latest commit

History

Repository files navigation

ICS24_Ansor_AF_DS

Introduction

1. Benchmarks

Create Conda environment

Benchmark script setting and explanation

Benchmark the Ansor-AF-DS

Benchmark Ansor

2. Reproduce the variability data

Calculate the variability

3. Reproduce the figures

Draw all the figures

Scatter plot

Cudnn VS Ansor

Ablation 1

Ablation 2

Performance plot scripts

Variability plot scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages