-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce MLIR transform dialect to BladeDISC #787
Labels
Comments
This was referenced Nov 24, 2022
wyzero
added a commit
that referenced
this issue
Dec 5, 2022
Following is some preliminary data: test on g6r; single thread; A, B and C are fully dynamic (pre-packing is not possible in such case) | m, n, k | DISC + transform (ms) | DISC + ACL (ms) | | ------------- | ------------- | ------------- | | 304, 256, 256 | 1.02 | 1.00 | | 304, 512, 256 | 2.00 | 2.02 | | 304, 1024, 256 | 4.10 | 4.00 | | 304, 1024, 512 | 8.56 | 7.99 | | 1024, 1024, 1024 | 60.0 | 52.8 | | 34, 512, 256 | 0.301 | 0.293 | | 74, 512, 256 | 0.561 | 0.544 | | 174, 512, 256 | 1.19 | 1.207 | | 34, 256, 256 | 0.135 | 0.158 | | 74, 256, 256 | 0.272 | 0.281 | | 174, 256, 256 | 0.592 | 0.589 | to #787
This was referenced Dec 6, 2022
This was referenced Dec 15, 2022
This was referenced Dec 22, 2022
e2e model test on: Bert Base (TF) and Albert (PyTorch), on g6r, using single thread. Note that we only have one default schedule for all shape and the schedule is known to be less performant when n or k is large, thus the initial performance is supposed to be improved when we support schedule selection logic. Bert Base (TF)
Albert (PyTorch)
|
This was referenced Dec 29, 2022
This was referenced Jan 6, 2023
This was referenced Jan 17, 2023
This was referenced Feb 1, 2023
This was referenced Feb 20, 2023
This was referenced Feb 27, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We'll start to explore using MLIR transform dialect to do codegen for (fused) compute-intensive pattern. The initial target is to support gemm codegen on ARM platform to address the dynamic shape problem of Arm Compute Library.
The initial plan is:
kTransform
for the transform-based fusion pattern.disc_linalg.multi_level_pack
op, used for doing packing.transform.disc.cache_read
transform op, relying ondisc_linalg.multi_level_pack
op.disc_linalg.multi_level_pack
.disc_linalg.multi_level_pack
to loop if it can not be folded.kTransform
fusion pattern, lower it to linalg and then schedule it.kTransform
pattern.The text was updated successfully, but these errors were encountered: