[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

Lurkrazy · 2024-06-28T23:43:09Z

This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for accelerating the auto-tuning process of GPU kernels within the Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the search space more efficiently than the existing Genetic Algorithm-based approach. The following changes are included:

Dynamic Gradient Descent Search:
- Implements a new search strategy that uses gradient descent in a multi-dimensional tile-space.
- Utilizes online measurements and proxy model to guide the search process.
Record Processor:
- A new class to handle the processing and modification of measure records.
- Includes methods to extract and modify SP node coordinates.

This implementation is based on the algorithm described in the paper "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations" presented at ICS'24.

Experimental evaluation on a number of matrix-matrix multiplication and convolution kernels shows that the DGD algorithm achieves an order-of-magnitude improvement in auto-tuning time while maintaining comparable code performance.

Usage:

To use the DGD Search algorithm, instantiate the DynamicGradientSearchTuner class with the desired parameters and call the dynamic_gradient_search method.

Example:

tuner = auto_scheduler.dynamic_gradient_search.DynamicGradientSearchTuner(task, log_file, tune_option)
tuner.dynamic_gradient_search()

Experiments setup:

The experiments used the DGD Search algorithm with a time budget of 1 hour and full duration used by Ansor, comparing the performance achieved by Ansor after suggested trials. The models used for the evaluation were Bert, ResNet-50, and MobileNetV2, with the following configurations based on the Apache blog Introducing TVM Auto-scheduler (a.k.a. Ansor):

Bert: 12000 trials, running on an Nvidia RTX 4090 for 6 hours.
ResNet-50: 20000 trials, running on an Nvidia RTX 4090 for 10 hours.
MobileNetV2: 16000 trials, running on an Nvidia RTX 4090 for 7 hours.

Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor

Networks	Ratio (1 hour)	Ratio (full)
Bert	93.71%	100.15%
ResNet-50	90.46%	96.73%
MobileNetV2	95.08%	101.75%

This table presents the relative performance of the DGD Search algorithm with a 1-hour time budget compared to the full duration used by Ansor. The performance ratios indicate the effectiveness of the Dynamic Gradient Descent Search algorithm in achieving comparable performance within a significantly reduced time frame.

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning

cbalint13 · 2024-08-12T09:00:43Z

Thank you @Lurkrazy for this contribution !

I add Cc to relevant folks here: @comaniac @jcf94 @merrymercy @FrozenGene @minminsun @jinhongyii

cbalint13

(x) Could some references be added to the benchmarks, howtos and docs parts ?

(x) Also please make sure the CI issues (lint & build) are also all-in-green state.

tqchen · 2024-08-12T12:33:28Z

Given that we are migrating toward meta-schedule and may phase out auto-scheduler, i would suggest we bring new xhanges to that path.

Lurkrazy added 7 commits June 5, 2024 03:53

Dynamic gradient search

df8bbbc

Merge branch 'apache:main' into dev

26cfcb6

add test file

15733b2

remove print

0bf72f1

SMview as an option

484abb9

clean up

7ea6f52

Merge pull request #3 from HPCRL/dev

9f92e5d

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning

Lurkrazy marked this pull request as ready for review June 29, 2024 00:07

Lurkrazy added 5 commits July 5, 2024 17:16

Merge remote-tracking branch 'refs/remotes/origin/main'

921cad3

Merge branch 'apache:main' into main

21df6c6

Merge remote-tracking branch 'refs/remotes/origin/main'

5892f7b

lint fix

442e092

fix lint

bad9462

cbalint13 added the tune:auto_scheduler src/auto_scheduler, python/tvm/auto_scheduler label Aug 12, 2024

cbalint13 self-assigned this Aug 12, 2024

cbalint13 requested changes Aug 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

Lurkrazy commented Jun 28, 2024 •

edited

Loading

cbalint13 commented Aug 12, 2024

cbalint13 left a comment

tqchen commented Aug 12, 2024

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

Are you sure you want to change the base?

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

Conversation

Lurkrazy commented Jun 28, 2024 • edited Loading

Usage:

Example:

Experiments setup:

Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor

cbalint13 commented Aug 12, 2024

cbalint13 left a comment

Choose a reason for hiding this comment

tqchen commented Aug 12, 2024

Lurkrazy commented Jun 28, 2024 •

edited

Loading