Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR] Support tensorization using ldmatrix + MMA #11355

Merged
merged 4 commits into from
May 20, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented May 18, 2022

MMA and ldmatrix intrinsics were respectively added in #9909 and #10855, this PR enables using them together in a TIR schedule. I think this is a very cool example of using transform_layout + tensorize together to target very low-level instructions.

Writing many variants of intrinsic descriptions and their tensorized implemenations (different shape, data type, transposed or not etc) was very tedious, but still it was reasonably manageable thanks to recent improvements in TVMScript meta programming - see #11097 and #11324 and how they are exploited in tir/tensor_intrin/cuda.py.

Of all the possible variants of MMA + ldmatrix combinations, only 6 of them are supported for now:

  • ldmatrix always loads 4 8x8 matrices
  • Only support 16x8x16 shape (float16) and 16x8x32 shape (int8) for MMA. These shapes are only supported by Ampere, so sm75 (T4) or sm70 (V100) are not supported. Missing variants (16x8x8 or 8x8x4) can be easily added later if desired.
  • Since I always load 4 matrices for each of A and B, I issue two MMA calls to fully utilize the loaded matrices and compute 16x16 output tiles (one MMA call consumes only 2 8x8 matrices from B). Operating on 16x16 tiles also makes it easier to port existing schedules using WMMA intrinsics to use MMA + ldmatrix instead.

The test case test_tir_schedule_tensorize_ldmatrix_mma.py exercises all 6 supported variants. If we set measure_perf = True, it reports GFLOPS on 4k inputs. The following is an example output running on RTX 3070. In particular, I also have an equivalent schedule for the f16f16f32 case using WMMA intrinsic (~38 TFLOPS), MMA + ldmatrix one below is faster (~40 TFLOPS).

(trans means the matrix B is transposed, e.g. our dense op)

f16f16f32_m16n16k16: 40086.985791 GFLOPS
f16f16f32_m16n16k16_trans: 39576.507331 GFLOPS
f16f16f16_m16n16k16: 64315.448428 GFLOPS
f16f16f16_m16n16k16_trans: 61917.847391 GFLOPS
i8i8i32_m16n16k32: 91906.635057 GOPS
i8i8i32_m16n16k32_trans: 96388.844717 GOPS

@vinx13 @junrushao1994 @shingjan @Hzfengsy @yzh119 @KnowingNothing

@masahi masahi force-pushed the ldmatrix-tensorization branch 3 times, most recently from 0e3d707 to 71a5b38 Compare May 18, 2022 10:01
@junrushao
Copy link
Member

This is super amazing work!!!!!!

@masahi masahi force-pushed the ldmatrix-tensorization branch 2 times, most recently from af0da1a to 401a820 Compare May 18, 2022 23:15
commit 3218fac
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 14:04:56 2022 +0900

    some clean up

commit 7a235b6
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 13:55:11 2022 +0900

    parameterize over storage scope in mma store intrin

commit 827ea4c
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 13:37:38 2022 +0900

    properly handle floordiv/mod in codegen

commit 42d4c6f
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 09:53:57 2022 +0900

    update tuned factors for fp16

commit 328d0aa
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 08:43:30 2022 +0900

    all tests working

commit 5e086cf
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 07:48:43 2022 +0900

    add doc for mma_fill and mma_store intrin

commit 4f945c4
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 06:39:01 2022 +0900

    remove tests

commit df7708f
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:52:14 2022 +0900

    unified test

commit 754c83e
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:36:24 2022 +0900

    clean up LowerWarpmemory

commit 178c3dc
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:15:04 2022 +0900

    Use IndexMap

commit 07fb589
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 17:51:44 2022 +0900

    remove 16x8x8 test

commit 2b05b5a
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 17:31:35 2022 +0900

    generate mma fill/store

commit bf23fc5
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 12:23:30 2022 +0900

    mma intrin generation with meta programming

commit 5afb5f0
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 05:26:14 2022 +0900

    ldmatrix intrin generation with meta programming

commit fb62abb
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 20:30:49 2022 +0900

    minor

commit 5a80adc
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:55:57 2022 +0900

    revert some change

commit e599a55
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:54:18 2022 +0900

    remove obsolete files

commit 4b13b85
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:51:21 2022 +0900

    wip

commit 848de63
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:44:29 2022 +0900

    wip

commit b35bff9
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:31:18 2022 +0900

    update parse error msg

commit ad9b053
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:26:51 2022 +0900

    fix for avoiding Buffer.vload(...) case

commit 54c6864
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:59:55 2022 +0900

    wip

commit 078060f
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:57:34 2022 +0900

    wip

commit 576f841
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:52:15 2022 +0900

    wip

commit 12a376a
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 17:54:58 2022 +0900

    Squashed commit of the following:

    commit 48eef49
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 17:40:48 2022 +0900

        more comment

    commit 8f67fc8
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 17:11:27 2022 +0900

        update test

    commit ad85036
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 16:54:01 2022 +0900

        add test

    commit 4a5dc3f
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 16:40:47 2022 +0900

        [TVMScript] Support function call to help construct AST

commit 76c1bcf
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 16:30:07 2022 +0900

    simplify iterator in layout transform

commit 9362803
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 11:31:39 2022 +0900

    remove obsolet files

commit 2e119b4
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 10:43:59 2022 +0900

    calculate mma store dst index using inverse affine map

commit 9489434
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 10:01:12 2022 +0900

    simplify store

commit 1adcb77
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 09:43:40 2022 +0900

    simplified fill

commit 7b13c73
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 09:22:17 2022 +0900

    simplify intrin desc using index map function

commit bcf212d
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 07:16:42 2022 +0900

    seems to work

commit dd8ccf9
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 07:11:57 2022 +0900

    poking with the parser

commit 596582c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 20:04:59 2022 +0900

    16x8x32 4k trans working

commit 273f89a
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:52:13 2022 +0900

    add 16x8x16 fp16 trans

commit 8e2066c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:32:37 2022 +0900

    16x8x16 4k trans working

commit c2d0744
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:25:52 2022 +0900

    16x8x16 trans working

commit c2e314c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 16:19:32 2022 +0900

    tuned int8 4k, 91 TOPS

commit 94d9d96
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 15:59:33 2022 +0900

    int8 4k tune working

commit 3ca8ca0
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 08:43:57 2022 +0900

    mma 16x8x32 int8 working with ldmatrix b workaround

commit 54f1cb7
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 18:23:27 2022 +0900

    wip

commit 9d2844d
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 16:38:53 2022 +0900

    test tensorize without layout transform

commit 86ee6da
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 15:15:34 2022 +0900

    int8 4k tensorize works

commit 39f9e32
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 12:44:39 2022 +0900

    begin int8 4k tune

commit 6fa91e5
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 18:53:20 2022 +0900

    try fix ldmatrix b for int8

commit 7a962cd
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 18:28:34 2022 +0900

    fixed warp_coeff

commit a0afb56
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 12:20:01 2022 +0900

    wip

commit f70ccd0
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 12:09:57 2022 +0900

    int8 tensorize working

commit 20321fa
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 07:06:22 2022 +0900

    starting 16x8x32 int8

commit 441fd19
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 05:50:46 2022 +0900

    adding fp16 accum case

commit c9d40b6
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 17:04:29 2022 +0900

    clean up

commit 5b2d486
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 16:38:19 2022 +0900

    16x8x16 4k tune working

commit c3cb170
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 16:20:27 2022 +0900

    tensoriz fixed

commit 68039b0
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:55:25 2022 +0900

    begin 16x8x16 4k tune

commit ced5d8d
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:50:11 2022 +0900

    16x8x16 worked

commit 3d2c90d
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:47:26 2022 +0900

    fix

commit 403050b
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:45:10 2022 +0900

    add 16x8x16 test

commit 18e8d73
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 06:50:32 2022 +0900

    fixed mma store codegen for 16x8x16

commit ec81250
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 04:25:25 2022 +0900

    add 16x8x16 mma store codegen

commit e08df2a
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 03:47:47 2022 +0900

    tensorized C_warp init

commit ae06789
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 03:06:06 2022 +0900

    mma store codegen working

commit deb4d66
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 10 19:22:57 2022 +0900

    update lower warp memory

commit 71fe5fe
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 10 09:01:42 2022 +0900

    tensorizing mma store

commit e80a1f1
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 19:54:08 2022 +0900

    clean up

commit a9640f4
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 19:40:55 2022 +0900

    add tunable 4k test, 36 TFLOPS

commit b9f7eae
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 18:01:08 2022 +0900

    fixed bug in LowerWarpMemory index splitting for ldmatrix

commit 00df308
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 07:58:17 2022 +0900

    fixed missing reverse_compute_at

commit 93f9fe7
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 06:55:12 2022 +0900

    add 4k test

commit 3689ef7
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 06:54:09 2022 +0900

    temp disable high dim base indices check in tensorize

commit 0c859c4
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 19:18:23 2022 +0900

    clean up

commit f6aadbf
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 19:13:09 2022 +0900

    Add 16x8x8 MMA + LDMatrix test

commit 4cf6b20
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 18:04:17 2022 +0900

    testing 16x8x8 ldmatrix tensoriation
@vinx13 vinx13 merged commit 0274d8e into apache:main May 20, 2022
@yzh119
Copy link
Member

yzh119 commented May 24, 2022

The results look amazing, thank you @masahi !

masahi added a commit to masahi/tvm that referenced this pull request May 25, 2022
* [TIR] Support tensorization using ldmatrix + MMA

commit 3218fac
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 14:04:56 2022 +0900

    some clean up

commit 7a235b6
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 13:55:11 2022 +0900

    parameterize over storage scope in mma store intrin

commit 827ea4c
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 13:37:38 2022 +0900

    properly handle floordiv/mod in codegen

commit 42d4c6f
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 09:53:57 2022 +0900

    update tuned factors for fp16

commit 328d0aa
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 08:43:30 2022 +0900

    all tests working

commit 5e086cf
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 07:48:43 2022 +0900

    add doc for mma_fill and mma_store intrin

commit 4f945c4
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 06:39:01 2022 +0900

    remove tests

commit df7708f
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:52:14 2022 +0900

    unified test

commit 754c83e
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:36:24 2022 +0900

    clean up LowerWarpmemory

commit 178c3dc
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:15:04 2022 +0900

    Use IndexMap

commit 07fb589
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 17:51:44 2022 +0900

    remove 16x8x8 test

commit 2b05b5a
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 17:31:35 2022 +0900

    generate mma fill/store

commit bf23fc5
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 12:23:30 2022 +0900

    mma intrin generation with meta programming

commit 5afb5f0
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 05:26:14 2022 +0900

    ldmatrix intrin generation with meta programming

commit fb62abb
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 20:30:49 2022 +0900

    minor

commit 5a80adc
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:55:57 2022 +0900

    revert some change

commit e599a55
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:54:18 2022 +0900

    remove obsolete files

commit 4b13b85
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:51:21 2022 +0900

    wip

commit 848de63
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:44:29 2022 +0900

    wip

commit b35bff9
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:31:18 2022 +0900

    update parse error msg

commit ad9b053
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:26:51 2022 +0900

    fix for avoiding Buffer.vload(...) case

commit 54c6864
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:59:55 2022 +0900

    wip

commit 078060f
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:57:34 2022 +0900

    wip

commit 576f841
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:52:15 2022 +0900

    wip

commit 12a376a
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 17:54:58 2022 +0900

    Squashed commit of the following:

    commit 48eef49
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 17:40:48 2022 +0900

        more comment

    commit 8f67fc8
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 17:11:27 2022 +0900

        update test

    commit ad85036
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 16:54:01 2022 +0900

        add test

    commit 4a5dc3f
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 16:40:47 2022 +0900

        [TVMScript] Support function call to help construct AST

commit 76c1bcf
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 16:30:07 2022 +0900

    simplify iterator in layout transform

commit 9362803
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 11:31:39 2022 +0900

    remove obsolet files

commit 2e119b4
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 10:43:59 2022 +0900

    calculate mma store dst index using inverse affine map

commit 9489434
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 10:01:12 2022 +0900

    simplify store

commit 1adcb77
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 09:43:40 2022 +0900

    simplified fill

commit 7b13c73
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 09:22:17 2022 +0900

    simplify intrin desc using index map function

commit bcf212d
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 07:16:42 2022 +0900

    seems to work

commit dd8ccf9
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 07:11:57 2022 +0900

    poking with the parser

commit 596582c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 20:04:59 2022 +0900

    16x8x32 4k trans working

commit 273f89a
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:52:13 2022 +0900

    add 16x8x16 fp16 trans

commit 8e2066c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:32:37 2022 +0900

    16x8x16 4k trans working

commit c2d0744
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:25:52 2022 +0900

    16x8x16 trans working

commit c2e314c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 16:19:32 2022 +0900

    tuned int8 4k, 91 TOPS

commit 94d9d96
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 15:59:33 2022 +0900

    int8 4k tune working

commit 3ca8ca0
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 08:43:57 2022 +0900

    mma 16x8x32 int8 working with ldmatrix b workaround

commit 54f1cb7
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 18:23:27 2022 +0900

    wip

commit 9d2844d
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 16:38:53 2022 +0900

    test tensorize without layout transform

commit 86ee6da
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 15:15:34 2022 +0900

    int8 4k tensorize works

commit 39f9e32
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 12:44:39 2022 +0900

    begin int8 4k tune

commit 6fa91e5
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 18:53:20 2022 +0900

    try fix ldmatrix b for int8

commit 7a962cd
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 18:28:34 2022 +0900

    fixed warp_coeff

commit a0afb56
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 12:20:01 2022 +0900

    wip

commit f70ccd0
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 12:09:57 2022 +0900

    int8 tensorize working

commit 20321fa
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 07:06:22 2022 +0900

    starting 16x8x32 int8

commit 441fd19
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 05:50:46 2022 +0900

    adding fp16 accum case

commit c9d40b6
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 17:04:29 2022 +0900

    clean up

commit 5b2d486
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 16:38:19 2022 +0900

    16x8x16 4k tune working

commit c3cb170
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 16:20:27 2022 +0900

    tensoriz fixed

commit 68039b0
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:55:25 2022 +0900

    begin 16x8x16 4k tune

commit ced5d8d
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:50:11 2022 +0900

    16x8x16 worked

commit 3d2c90d
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:47:26 2022 +0900

    fix

commit 403050b
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:45:10 2022 +0900

    add 16x8x16 test

commit 18e8d73
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 06:50:32 2022 +0900

    fixed mma store codegen for 16x8x16

commit ec81250
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 04:25:25 2022 +0900

    add 16x8x16 mma store codegen

commit e08df2a
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 03:47:47 2022 +0900

    tensorized C_warp init

commit ae06789
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 03:06:06 2022 +0900

    mma store codegen working

commit deb4d66
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 10 19:22:57 2022 +0900

    update lower warp memory

commit 71fe5fe
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 10 09:01:42 2022 +0900

    tensorizing mma store

commit e80a1f1
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 19:54:08 2022 +0900

    clean up

commit a9640f4
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 19:40:55 2022 +0900

    add tunable 4k test, 36 TFLOPS

commit b9f7eae
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 18:01:08 2022 +0900

    fixed bug in LowerWarpMemory index splitting for ldmatrix

commit 00df308
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 07:58:17 2022 +0900

    fixed missing reverse_compute_at

commit 93f9fe7
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 06:55:12 2022 +0900

    add 4k test

commit 3689ef7
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 06:54:09 2022 +0900

    temp disable high dim base indices check in tensorize

commit 0c859c4
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 19:18:23 2022 +0900

    clean up

commit f6aadbf
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 19:13:09 2022 +0900

    Add 16x8x8 MMA + LDMatrix test

commit 4cf6b20
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 18:04:17 2022 +0900

    testing 16x8x8 ldmatrix tensoriation

* set measure_perf to False

* add requires_gpu decorator in tests, always test build on non-ampere

* skip cuda compile on old gpu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants