Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autotuning #9

Open
YingboMa opened this issue May 16, 2020 · 0 comments
Open

Autotuning #9

YingboMa opened this issue May 16, 2020 · 0 comments

Comments

@YingboMa
Copy link
Owner

Please don't be scared by the title, and think it's going to take a few days to do :-). It should be done in less than 10 minutes. Here is the plan @chriselrod and I came up with.

  1. search for a good kernel size
  2. compute the cache size with an analytical model
  3. search for a good packing strategy

[1] can be done by directly calling the packing=(Val(true), Val(true)) macro kernel with different micro_ms and micro_ns, and benchmark the macro kernel on 400 x 400 and 397 x 397 sized DGEMM (all other types can be handled by just rescaling micro_m).

[2] can be done by some formulae depend on the cache property.

[3] can be done efficiently with bisection, assuming there is one and only one crossing.

The autotuning is off by default, and one can enable it with

ENV["AUTOTUNE_MABLAS"] = true
] build MaBLAS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant