Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Lurkrazy
Copy link

@Lurkrazy Lurkrazy commented Jun 28, 2024

This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for accelerating the auto-tuning process of GPU kernels within the Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the search space more efficiently than the existing Genetic Algorithm-based approach. The following changes are included:

  1. Dynamic Gradient Descent Search:

    • Implements a new search strategy that uses gradient descent in a multi-dimensional tile-space.
    • Utilizes online measurements and proxy model to guide the search process.
  2. Record Processor:

    • A new class to handle the processing and modification of measure records.
    • Includes methods to extract and modify SP node coordinates.

This implementation is based on the algorithm described in the paper "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations" presented at ICS'24.

Experimental evaluation on a number of matrix-matrix multiplication and convolution kernels shows that the DGD algorithm achieves an order-of-magnitude improvement in auto-tuning time while maintaining comparable code performance.

Usage:

To use the DGD Search algorithm, instantiate the DynamicGradientSearchTuner class with the desired parameters and call the dynamic_gradient_search method.

Example:

tuner = auto_scheduler.dynamic_gradient_search.DynamicGradientSearchTuner(task, log_file, tune_option)
tuner.dynamic_gradient_search()

Experiments setup:

The experiments used the DGD Search algorithm with a time budget of 1 hour and full duration used by Ansor, comparing the performance achieved by Ansor after suggested trials. The models used for the evaluation were Bert, ResNet-50, and MobileNetV2, with the following configurations based on the Apache blog Introducing TVM Auto-scheduler (a.k.a. Ansor):

  • Bert: 12000 trials, running on an Nvidia RTX 4090 for 6 hours.
  • ResNet-50: 20000 trials, running on an Nvidia RTX 4090 for 10 hours.
  • MobileNetV2: 16000 trials, running on an Nvidia RTX 4090 for 7 hours.

Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor

Networks Ratio (1 hour) Ratio (full)
Bert 93.71% 100.15%
ResNet-50 90.46% 96.73%
MobileNetV2 95.08% 101.75%

This table presents the relative performance of the DGD Search algorithm with a 1-hour time budget compared to the full duration used by Ansor. The performance ratios indicate the effectiveness of the Dynamic Gradient Descent Search algorithm in achieving comparable performance within a significantly reduced time frame.

@Lurkrazy Lurkrazy marked this pull request as ready for review June 29, 2024 00:07
@cbalint13
Copy link
Contributor

Thank you @Lurkrazy for this contribution !

I add Cc to relevant folks here: @comaniac @jcf94 @merrymercy @FrozenGene @minminsun @jinhongyii

@cbalint13 cbalint13 added the tune:auto_scheduler src/auto_scheduler, python/tvm/auto_scheduler label Aug 12, 2024
@cbalint13 cbalint13 self-assigned this Aug 12, 2024
Copy link
Contributor

@cbalint13 cbalint13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(x) Could some references be added to the benchmarks, howtos and docs parts ?

(x) Also please make sure the CI issues (lint & build) are also all-in-green state.

@tqchen
Copy link
Member

tqchen commented Aug 12, 2024

Given that we are migrating toward meta-schedule and may phase out auto-scheduler, i would suggest we bring new xhanges to that path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tune:auto_scheduler src/auto_scheduler, python/tvm/auto_scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants