-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TIR] Support tensorization using ldmatrix + MMA #11355
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
masahi
force-pushed
the
ldmatrix-tensorization
branch
3 times, most recently
from
May 18, 2022 10:01
0e3d707
to
71a5b38
Compare
masahi
commented
May 18, 2022
This is super amazing work!!!!!! |
masahi
force-pushed
the
ldmatrix-tensorization
branch
2 times, most recently
from
May 18, 2022 23:15
af0da1a
to
401a820
Compare
commit 3218fac Author: Masahiro Masuda <[email protected]> Date: Wed May 18 14:04:56 2022 +0900 some clean up commit 7a235b6 Author: Masahiro Masuda <[email protected]> Date: Wed May 18 13:55:11 2022 +0900 parameterize over storage scope in mma store intrin commit 827ea4c Author: Masahiro Masuda <[email protected]> Date: Wed May 18 13:37:38 2022 +0900 properly handle floordiv/mod in codegen commit 42d4c6f Author: Masahiro Masuda <[email protected]> Date: Wed May 18 09:53:57 2022 +0900 update tuned factors for fp16 commit 328d0aa Author: Masahiro Masuda <[email protected]> Date: Wed May 18 08:43:30 2022 +0900 all tests working commit 5e086cf Author: Masahiro Masuda <[email protected]> Date: Wed May 18 07:48:43 2022 +0900 add doc for mma_fill and mma_store intrin commit 4f945c4 Author: Masahiro Masuda <[email protected]> Date: Wed May 18 06:39:01 2022 +0900 remove tests commit df7708f Author: Masahiro Masuda <[email protected]> Date: Tue May 17 19:52:14 2022 +0900 unified test commit 754c83e Author: Masahiro Masuda <[email protected]> Date: Tue May 17 19:36:24 2022 +0900 clean up LowerWarpmemory commit 178c3dc Author: Masahiro Masuda <[email protected]> Date: Tue May 17 19:15:04 2022 +0900 Use IndexMap commit 07fb589 Author: Masahiro Masuda <[email protected]> Date: Tue May 17 17:51:44 2022 +0900 remove 16x8x8 test commit 2b05b5a Author: Masahiro Masuda <[email protected]> Date: Tue May 17 17:31:35 2022 +0900 generate mma fill/store commit bf23fc5 Author: Masahiro Masuda <[email protected]> Date: Tue May 17 12:23:30 2022 +0900 mma intrin generation with meta programming commit 5afb5f0 Author: Masahiro Masuda <[email protected]> Date: Tue May 17 05:26:14 2022 +0900 ldmatrix intrin generation with meta programming commit fb62abb Author: Masahiro Masuda <[email protected]> Date: Mon May 16 20:30:49 2022 +0900 minor commit 5a80adc Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:55:57 2022 +0900 revert some change commit e599a55 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:54:18 2022 +0900 remove obsolete files commit 4b13b85 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:51:21 2022 +0900 wip commit 848de63 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:44:29 2022 +0900 wip commit b35bff9 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:31:18 2022 +0900 update parse error msg commit ad9b053 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:26:51 2022 +0900 fix for avoiding Buffer.vload(...) case commit 54c6864 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 18:59:55 2022 +0900 wip commit 078060f Author: Masahiro Masuda <[email protected]> Date: Mon May 16 18:57:34 2022 +0900 wip commit 576f841 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 18:52:15 2022 +0900 wip commit 12a376a Author: Masahiro Masuda <[email protected]> Date: Mon May 16 17:54:58 2022 +0900 Squashed commit of the following: commit 48eef49 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 17:40:48 2022 +0900 more comment commit 8f67fc8 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 17:11:27 2022 +0900 update test commit ad85036 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 16:54:01 2022 +0900 add test commit 4a5dc3f Author: Masahiro Masuda <[email protected]> Date: Mon May 16 16:40:47 2022 +0900 [TVMScript] Support function call to help construct AST commit 76c1bcf Author: Masahiro Masuda <[email protected]> Date: Mon May 16 16:30:07 2022 +0900 simplify iterator in layout transform commit 9362803 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 11:31:39 2022 +0900 remove obsolet files commit 2e119b4 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 10:43:59 2022 +0900 calculate mma store dst index using inverse affine map commit 9489434 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 10:01:12 2022 +0900 simplify store commit 1adcb77 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 09:43:40 2022 +0900 simplified fill commit 7b13c73 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 09:22:17 2022 +0900 simplify intrin desc using index map function commit bcf212d Author: Masahiro Masuda <[email protected]> Date: Sat May 14 07:16:42 2022 +0900 seems to work commit dd8ccf9 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 07:11:57 2022 +0900 poking with the parser commit 596582c Author: Masahiro Masuda <[email protected]> Date: Fri May 13 20:04:59 2022 +0900 16x8x32 4k trans working commit 273f89a Author: Masahiro Masuda <[email protected]> Date: Fri May 13 19:52:13 2022 +0900 add 16x8x16 fp16 trans commit 8e2066c Author: Masahiro Masuda <[email protected]> Date: Fri May 13 19:32:37 2022 +0900 16x8x16 4k trans working commit c2d0744 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 19:25:52 2022 +0900 16x8x16 trans working commit c2e314c Author: Masahiro Masuda <[email protected]> Date: Fri May 13 16:19:32 2022 +0900 tuned int8 4k, 91 TOPS commit 94d9d96 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 15:59:33 2022 +0900 int8 4k tune working commit 3ca8ca0 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 08:43:57 2022 +0900 mma 16x8x32 int8 working with ldmatrix b workaround commit 54f1cb7 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 18:23:27 2022 +0900 wip commit 9d2844d Author: Masahiro Masuda <[email protected]> Date: Fri May 13 16:38:53 2022 +0900 test tensorize without layout transform commit 86ee6da Author: Masahiro Masuda <[email protected]> Date: Fri May 13 15:15:34 2022 +0900 int8 4k tensorize works commit 39f9e32 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 12:44:39 2022 +0900 begin int8 4k tune commit 6fa91e5 Author: Masahiro Masuda <[email protected]> Date: Thu May 12 18:53:20 2022 +0900 try fix ldmatrix b for int8 commit 7a962cd Author: Masahiro Masuda <[email protected]> Date: Thu May 12 18:28:34 2022 +0900 fixed warp_coeff commit a0afb56 Author: Masahiro Masuda <[email protected]> Date: Thu May 12 12:20:01 2022 +0900 wip commit f70ccd0 Author: Masahiro Masuda <[email protected]> Date: Thu May 12 12:09:57 2022 +0900 int8 tensorize working commit 20321fa Author: Masahiro Masuda <[email protected]> Date: Thu May 12 07:06:22 2022 +0900 starting 16x8x32 int8 commit 441fd19 Author: Masahiro Masuda <[email protected]> Date: Thu May 12 05:50:46 2022 +0900 adding fp16 accum case commit c9d40b6 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 17:04:29 2022 +0900 clean up commit 5b2d486 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 16:38:19 2022 +0900 16x8x16 4k tune working commit c3cb170 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 16:20:27 2022 +0900 tensoriz fixed commit 68039b0 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 15:55:25 2022 +0900 begin 16x8x16 4k tune commit ced5d8d Author: Masahiro Masuda <[email protected]> Date: Wed May 11 15:50:11 2022 +0900 16x8x16 worked commit 3d2c90d Author: Masahiro Masuda <[email protected]> Date: Wed May 11 15:47:26 2022 +0900 fix commit 403050b Author: Masahiro Masuda <[email protected]> Date: Wed May 11 15:45:10 2022 +0900 add 16x8x16 test commit 18e8d73 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 06:50:32 2022 +0900 fixed mma store codegen for 16x8x16 commit ec81250 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 04:25:25 2022 +0900 add 16x8x16 mma store codegen commit e08df2a Author: Masahiro Masuda <[email protected]> Date: Wed May 11 03:47:47 2022 +0900 tensorized C_warp init commit ae06789 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 03:06:06 2022 +0900 mma store codegen working commit deb4d66 Author: Masahiro Masuda <[email protected]> Date: Tue May 10 19:22:57 2022 +0900 update lower warp memory commit 71fe5fe Author: Masahiro Masuda <[email protected]> Date: Tue May 10 09:01:42 2022 +0900 tensorizing mma store commit e80a1f1 Author: Masahiro Masuda <[email protected]> Date: Thu Apr 28 19:54:08 2022 +0900 clean up commit a9640f4 Author: Masahiro Masuda <[email protected]> Date: Thu Apr 28 19:40:55 2022 +0900 add tunable 4k test, 36 TFLOPS commit b9f7eae Author: Masahiro Masuda <[email protected]> Date: Thu Apr 28 18:01:08 2022 +0900 fixed bug in LowerWarpMemory index splitting for ldmatrix commit 00df308 Author: Masahiro Masuda <[email protected]> Date: Wed Apr 27 07:58:17 2022 +0900 fixed missing reverse_compute_at commit 93f9fe7 Author: Masahiro Masuda <[email protected]> Date: Wed Apr 27 06:55:12 2022 +0900 add 4k test commit 3689ef7 Author: Masahiro Masuda <[email protected]> Date: Wed Apr 27 06:54:09 2022 +0900 temp disable high dim base indices check in tensorize commit 0c859c4 Author: Masahiro Masuda <[email protected]> Date: Tue Apr 26 19:18:23 2022 +0900 clean up commit f6aadbf Author: Masahiro Masuda <[email protected]> Date: Tue Apr 26 19:13:09 2022 +0900 Add 16x8x8 MMA + LDMatrix test commit 4cf6b20 Author: Masahiro Masuda <[email protected]> Date: Tue Apr 26 18:04:17 2022 +0900 testing 16x8x8 ldmatrix tensoriation
masahi
force-pushed
the
ldmatrix-tensorization
branch
from
May 20, 2022 08:54
401a820
to
7bf882c
Compare
vinx13
approved these changes
May 20, 2022
The results look amazing, thank you @masahi ! |
masahi
added a commit
to masahi/tvm
that referenced
this pull request
May 25, 2022
* [TIR] Support tensorization using ldmatrix + MMA commit 3218fac Author: Masahiro Masuda <[email protected]> Date: Wed May 18 14:04:56 2022 +0900 some clean up commit 7a235b6 Author: Masahiro Masuda <[email protected]> Date: Wed May 18 13:55:11 2022 +0900 parameterize over storage scope in mma store intrin commit 827ea4c Author: Masahiro Masuda <[email protected]> Date: Wed May 18 13:37:38 2022 +0900 properly handle floordiv/mod in codegen commit 42d4c6f Author: Masahiro Masuda <[email protected]> Date: Wed May 18 09:53:57 2022 +0900 update tuned factors for fp16 commit 328d0aa Author: Masahiro Masuda <[email protected]> Date: Wed May 18 08:43:30 2022 +0900 all tests working commit 5e086cf Author: Masahiro Masuda <[email protected]> Date: Wed May 18 07:48:43 2022 +0900 add doc for mma_fill and mma_store intrin commit 4f945c4 Author: Masahiro Masuda <[email protected]> Date: Wed May 18 06:39:01 2022 +0900 remove tests commit df7708f Author: Masahiro Masuda <[email protected]> Date: Tue May 17 19:52:14 2022 +0900 unified test commit 754c83e Author: Masahiro Masuda <[email protected]> Date: Tue May 17 19:36:24 2022 +0900 clean up LowerWarpmemory commit 178c3dc Author: Masahiro Masuda <[email protected]> Date: Tue May 17 19:15:04 2022 +0900 Use IndexMap commit 07fb589 Author: Masahiro Masuda <[email protected]> Date: Tue May 17 17:51:44 2022 +0900 remove 16x8x8 test commit 2b05b5a Author: Masahiro Masuda <[email protected]> Date: Tue May 17 17:31:35 2022 +0900 generate mma fill/store commit bf23fc5 Author: Masahiro Masuda <[email protected]> Date: Tue May 17 12:23:30 2022 +0900 mma intrin generation with meta programming commit 5afb5f0 Author: Masahiro Masuda <[email protected]> Date: Tue May 17 05:26:14 2022 +0900 ldmatrix intrin generation with meta programming commit fb62abb Author: Masahiro Masuda <[email protected]> Date: Mon May 16 20:30:49 2022 +0900 minor commit 5a80adc Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:55:57 2022 +0900 revert some change commit e599a55 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:54:18 2022 +0900 remove obsolete files commit 4b13b85 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:51:21 2022 +0900 wip commit 848de63 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:44:29 2022 +0900 wip commit b35bff9 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:31:18 2022 +0900 update parse error msg commit ad9b053 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 19:26:51 2022 +0900 fix for avoiding Buffer.vload(...) case commit 54c6864 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 18:59:55 2022 +0900 wip commit 078060f Author: Masahiro Masuda <[email protected]> Date: Mon May 16 18:57:34 2022 +0900 wip commit 576f841 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 18:52:15 2022 +0900 wip commit 12a376a Author: Masahiro Masuda <[email protected]> Date: Mon May 16 17:54:58 2022 +0900 Squashed commit of the following: commit 48eef49 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 17:40:48 2022 +0900 more comment commit 8f67fc8 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 17:11:27 2022 +0900 update test commit ad85036 Author: Masahiro Masuda <[email protected]> Date: Mon May 16 16:54:01 2022 +0900 add test commit 4a5dc3f Author: Masahiro Masuda <[email protected]> Date: Mon May 16 16:40:47 2022 +0900 [TVMScript] Support function call to help construct AST commit 76c1bcf Author: Masahiro Masuda <[email protected]> Date: Mon May 16 16:30:07 2022 +0900 simplify iterator in layout transform commit 9362803 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 11:31:39 2022 +0900 remove obsolet files commit 2e119b4 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 10:43:59 2022 +0900 calculate mma store dst index using inverse affine map commit 9489434 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 10:01:12 2022 +0900 simplify store commit 1adcb77 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 09:43:40 2022 +0900 simplified fill commit 7b13c73 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 09:22:17 2022 +0900 simplify intrin desc using index map function commit bcf212d Author: Masahiro Masuda <[email protected]> Date: Sat May 14 07:16:42 2022 +0900 seems to work commit dd8ccf9 Author: Masahiro Masuda <[email protected]> Date: Sat May 14 07:11:57 2022 +0900 poking with the parser commit 596582c Author: Masahiro Masuda <[email protected]> Date: Fri May 13 20:04:59 2022 +0900 16x8x32 4k trans working commit 273f89a Author: Masahiro Masuda <[email protected]> Date: Fri May 13 19:52:13 2022 +0900 add 16x8x16 fp16 trans commit 8e2066c Author: Masahiro Masuda <[email protected]> Date: Fri May 13 19:32:37 2022 +0900 16x8x16 4k trans working commit c2d0744 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 19:25:52 2022 +0900 16x8x16 trans working commit c2e314c Author: Masahiro Masuda <[email protected]> Date: Fri May 13 16:19:32 2022 +0900 tuned int8 4k, 91 TOPS commit 94d9d96 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 15:59:33 2022 +0900 int8 4k tune working commit 3ca8ca0 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 08:43:57 2022 +0900 mma 16x8x32 int8 working with ldmatrix b workaround commit 54f1cb7 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 18:23:27 2022 +0900 wip commit 9d2844d Author: Masahiro Masuda <[email protected]> Date: Fri May 13 16:38:53 2022 +0900 test tensorize without layout transform commit 86ee6da Author: Masahiro Masuda <[email protected]> Date: Fri May 13 15:15:34 2022 +0900 int8 4k tensorize works commit 39f9e32 Author: Masahiro Masuda <[email protected]> Date: Fri May 13 12:44:39 2022 +0900 begin int8 4k tune commit 6fa91e5 Author: Masahiro Masuda <[email protected]> Date: Thu May 12 18:53:20 2022 +0900 try fix ldmatrix b for int8 commit 7a962cd Author: Masahiro Masuda <[email protected]> Date: Thu May 12 18:28:34 2022 +0900 fixed warp_coeff commit a0afb56 Author: Masahiro Masuda <[email protected]> Date: Thu May 12 12:20:01 2022 +0900 wip commit f70ccd0 Author: Masahiro Masuda <[email protected]> Date: Thu May 12 12:09:57 2022 +0900 int8 tensorize working commit 20321fa Author: Masahiro Masuda <[email protected]> Date: Thu May 12 07:06:22 2022 +0900 starting 16x8x32 int8 commit 441fd19 Author: Masahiro Masuda <[email protected]> Date: Thu May 12 05:50:46 2022 +0900 adding fp16 accum case commit c9d40b6 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 17:04:29 2022 +0900 clean up commit 5b2d486 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 16:38:19 2022 +0900 16x8x16 4k tune working commit c3cb170 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 16:20:27 2022 +0900 tensoriz fixed commit 68039b0 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 15:55:25 2022 +0900 begin 16x8x16 4k tune commit ced5d8d Author: Masahiro Masuda <[email protected]> Date: Wed May 11 15:50:11 2022 +0900 16x8x16 worked commit 3d2c90d Author: Masahiro Masuda <[email protected]> Date: Wed May 11 15:47:26 2022 +0900 fix commit 403050b Author: Masahiro Masuda <[email protected]> Date: Wed May 11 15:45:10 2022 +0900 add 16x8x16 test commit 18e8d73 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 06:50:32 2022 +0900 fixed mma store codegen for 16x8x16 commit ec81250 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 04:25:25 2022 +0900 add 16x8x16 mma store codegen commit e08df2a Author: Masahiro Masuda <[email protected]> Date: Wed May 11 03:47:47 2022 +0900 tensorized C_warp init commit ae06789 Author: Masahiro Masuda <[email protected]> Date: Wed May 11 03:06:06 2022 +0900 mma store codegen working commit deb4d66 Author: Masahiro Masuda <[email protected]> Date: Tue May 10 19:22:57 2022 +0900 update lower warp memory commit 71fe5fe Author: Masahiro Masuda <[email protected]> Date: Tue May 10 09:01:42 2022 +0900 tensorizing mma store commit e80a1f1 Author: Masahiro Masuda <[email protected]> Date: Thu Apr 28 19:54:08 2022 +0900 clean up commit a9640f4 Author: Masahiro Masuda <[email protected]> Date: Thu Apr 28 19:40:55 2022 +0900 add tunable 4k test, 36 TFLOPS commit b9f7eae Author: Masahiro Masuda <[email protected]> Date: Thu Apr 28 18:01:08 2022 +0900 fixed bug in LowerWarpMemory index splitting for ldmatrix commit 00df308 Author: Masahiro Masuda <[email protected]> Date: Wed Apr 27 07:58:17 2022 +0900 fixed missing reverse_compute_at commit 93f9fe7 Author: Masahiro Masuda <[email protected]> Date: Wed Apr 27 06:55:12 2022 +0900 add 4k test commit 3689ef7 Author: Masahiro Masuda <[email protected]> Date: Wed Apr 27 06:54:09 2022 +0900 temp disable high dim base indices check in tensorize commit 0c859c4 Author: Masahiro Masuda <[email protected]> Date: Tue Apr 26 19:18:23 2022 +0900 clean up commit f6aadbf Author: Masahiro Masuda <[email protected]> Date: Tue Apr 26 19:13:09 2022 +0900 Add 16x8x8 MMA + LDMatrix test commit 4cf6b20 Author: Masahiro Masuda <[email protected]> Date: Tue Apr 26 18:04:17 2022 +0900 testing 16x8x8 ldmatrix tensoriation * set measure_perf to False * add requires_gpu decorator in tests, always test build on non-ampere * skip cuda compile on old gpu
62 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MMA and ldmatrix intrinsics were respectively added in #9909 and #10855, this PR enables using them together in a TIR schedule. I think this is a very cool example of using
transform_layout + tensorize
together to target very low-level instructions.Writing many variants of intrinsic descriptions and their tensorized implemenations (different shape, data type, transposed or not etc) was very tedious, but still it was reasonably manageable thanks to recent improvements in TVMScript meta programming - see #11097 and #11324 and how they are exploited in
tir/tensor_intrin/cuda.py
.Of all the possible variants of MMA + ldmatrix combinations, only 6 of them are supported for now:
ldmatrix
always loads 4 8x8 matricesThe test case
test_tir_schedule_tensorize_ldmatrix_mma.py
exercises all 6 supported variants. If we setmeasure_perf = True
, it reports GFLOPS on 4k inputs. The following is an example output running on RTX 3070. In particular, I also have an equivalent schedule for thef16f16f32
case using WMMA intrinsic (~38 TFLOPS), MMA + ldmatrix one below is faster (~40 TFLOPS).(
trans
means the matrix B is transposed, e.g. ourdense
op)@vinx13 @junrushao1994 @shingjan @Hzfengsy @yzh119 @KnowingNothing