Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onert] Optimize BatchMatMul kernel in cpu backend #12140

Open
ragmani opened this issue Nov 28, 2023 · 4 comments
Open

[onert] Optimize BatchMatMul kernel in cpu backend #12140

ragmani opened this issue Nov 28, 2023 · 4 comments
Assignees
Labels
area/onert ONE runtime help wanted Extra attention is needed

Comments

@ragmani
Copy link
Contributor

ragmani commented Nov 28, 2023

Let's optimize BatchMatMul kernel in cpu backend
Currently BatchMatMul kernel in cpu backend is not optimized yet.

@ragmani ragmani added help wanted Extra attention is needed area/onert ONE runtime labels Nov 28, 2023
@zetwhite
Copy link
Contributor

zetwhite commented Dec 1, 2023

I'm interested in this task.
But I couldn't catch the details.
Could you share any background context?

@ragmani
Copy link
Contributor Author

ragmani commented Dec 1, 2023

Could you share any background context?

The background is to support nmt model now. For now, though, I think this task will be useful in the future.

@chunseoklee
Copy link
Contributor

Could you share any background context?

There is a long story about detailed context. In short, this is a part of work to support transformer-based model (in this case, Machine Translation) For reference, please see link

@glistening
Copy link
Contributor

glistening commented Nov 13, 2024

@tomdol continuation from #14305 (comment)

We need optimized batch matmul for arm32.

For transposed batch matmul, I confirmed it works and is enough using ggml_mulmat.

For normal batch matmul like torch.bmm, which has b × n × m and b × m × p tensors for inputs, we need optimized kernel.

You may bring from other open source kernels (xnnpack, kleidiai, ...) or write by yourself. ( I am not sure that these has the optimized kernel what we need.)

We have an alternative option (e.g. insert transpose before normat batch matmul, and use ggml_mulmat). Thus, if you think that this task takes much time or seems not to be better than alternative approach, you don't need to take this task.

@glistening glistening self-assigned this Nov 14, 2024
@glistening glistening moved this from Ready to Start to In Progress in [ONE] onert - LLM support Nov 14, 2024
@glistening glistening removed their assignment Nov 14, 2024
@glistening glistening moved this from In Progress to Ready to Start in [ONE] onert - LLM support Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/onert ONE runtime help wanted Extra attention is needed
Projects
Status: Ready to Start
Development

No branches or pull requests

6 participants