-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compute/cker] Optimize BatchMatMul for x86 #14305
Conversation
This commit adds an optimized version of the BatchMatMul kernel. The optimization targets the x86 architecture, in all other cases the code is compiled with existing reference kernel. The new kernel calls the optimized::Gemm function which uses Eigen internally. Additionally to avoid code duplication a new BatchMatMulParams struct is introduced and reused in both reference and optimized kernels. ONE-DCO-1.0-Signed-off-by: Tomasz Dolbniak <[email protected]>
@tomdol Just for your information, our main target is arm, not x64. I guess you're already aware of it since you used (partially) solve. Also, for LLM, we will use GGML kernel which provides the quantized type kernel for lower than 8 bit. |
In addition, this PR does not have test. How did you test this kernel? |
There are existing tests for this kernel. I was thinking if I should add any but it seems that all use cases are covered.
Regarding the GGML kernel - is someone already working on it? I was going to attempt to write an optimized version for ARM too in the next step, I would just like to know if I should proceed. |
@tomdol Thank you for answer. Test was done via nnap tests.
I checked out model. For our model, BatchMatMul f32 (both lhs, rhs) is necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@tomdol For arm optimized kernel, I am thinking of using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@glistening sorry about the delay in replying. I didn't think about any particular kernel yet except that there was a need for an ARM-targetting optimized version too. I was hoping to figure out more by discussing it in #12140 I would appreciate some guidelines and would like to offer to help with this part of the BatchMatMul optimization work. Unless of course someone is already taking care of it :) |
This commit adds an optimized version of the BatchMatMul kernel. The optimization targets the x86 architecture, in all other cases the code is compiled with existing reference kernel.
The new kernel calls the optimized::Gemm function which uses Eigen internally.
Additionally to avoid code duplication a new BatchMatMulParams struct is introduced and reused in both reference and optimized kernels.
ONE-DCO-1.0-Signed-off-by: Tomasz Dolbniak [email protected]