-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[onert] Optimize BatchMatMul kernel in cpu backend #12140
Comments
I'm interested in this task. |
The background is to support nmt model now. For now, though, I think this task will be useful in the future. |
There is a long story about detailed context. In short, this is a part of work to support transformer-based model (in this case, Machine Translation) For reference, please see link |
@tomdol continuation from #14305 (comment) We need optimized batch matmul for arm32. For transposed batch matmul, I confirmed it works and is enough using For normal batch matmul like torch.bmm, which has You may bring from other open source kernels ( We have an alternative option (e.g. insert |
Let's optimize BatchMatMul kernel in cpu backend
Currently BatchMatMul kernel in cpu backend is not optimized yet.
The text was updated successfully, but these errors were encountered: