Introduce MLIR transform dialect to BladeDISC #787

wyzero · 2022-11-24T05:49:52Z

Following is some preliminary data： test on g6r; single thread; A, B and C are fully dynamic (pre-packing is not possible in such case) | m, n, k | DISC + transform (ms) | DISC + ACL (ms) | | ------------- | ------------- | ------------- | | 304, 256, 256 | 1.02 | 1.00 | | 304, 512, 256 | 2.00 | 2.02 | | 304, 1024, 256 | 4.10 | 4.00 | | 304, 1024, 512 | 8.56 | 7.99 | | 1024, 1024, 1024 | 60.0 | 52.8 | | 34, 512, 256 | 0.301 | 0.293 | | 74, 512, 256 | 0.561 | 0.544 | | 174, 512, 256 | 1.19 | 1.207 | | 34, 256, 256 | 0.135 | 0.158 | | 74, 256, 256 | 0.272 | 0.281 | | 174, 256, 256 | 0.592 | 0.589 | to #787

wyzero · 2022-12-29T01:26:24Z

e2e model test on: Bert Base (TF) and Albert (PyTorch), on g6r, using single thread. Note that we only have one default schedule for all shape and the schedule is known to be less performant when n or k is large, thus the initial performance is supposed to be improved when we support schedule selection logic.

Bert Base (TF)

input	TF 2.8(s)	DISC-ACL(s)	DISC-Transform(s)	speedup (DISC-transform / DISC-ACL)
(1, 128)	0.742	0.638	0.656	97.3%
(2, 128)	1.41	1.24	1.27	97.6%
(4, 128)	2.85	2.36	2.55	92.5%
(8, 128)	5.84	4.68	5.07	92.3%
(16, 128)	11.9	9.55	10.2	93.6%

Albert (PyTorch)

input	TorchScript	OnnxRuntime	DISC-ACL	DISC-Transform
(2, 12)	0.197	0.140	0.117	0.139

wyzero · 2023-03-23T01:34:43Z

some sharing doc:

https://bladedisc.oss-cn-hangzhou.aliyuncs.com/docs/transform-dialect-based-codegen-in-bladedisc.pdf

wyzero added the Transform Dialect label Nov 24, 2022

wyzero mentioned this issue Dec 2, 2022

[transform] add a two-level tiling schedule and related UTs #831

Merged

This was referenced Dec 15, 2022

[transform] support fuse const weight into kTransform fusion pattern #873

Merged

[transform] e2e UT for packed gemm #892

Merged

This was referenced Dec 22, 2022

[transform] add a default schedule #899

Merged

[transform] make the default schedule support small shape #906

Merged

[transform] not allow const as the final output of fusion #912

Merged

This was referenced Dec 29, 2022

[transform] name each op and transfrom based on name instead of type #922

Merged

[transform] support nt/tn/tt mamtul #926

Merged

[transform] inject some static info whenever possible #932

Merged

This was referenced Jan 6, 2023

[transform] inline reduction initializer #942

Merged

[transform] inject schedule selection logic #948

Merged

[transform][debug] hijack codegen pipeline and use provided code directly #956

Merged

wyzero mentioned this issue Jan 16, 2023

[transform] add primitive to decompose vector ops to smaller ones #963

Merged

This was referenced Jan 17, 2023

[transform] convert foreach thread op to parallel right before outline cpu kernel #967

Merged

[transform] tow level tilings for m dimension in large k schedule #970

Merged

wyzero mentioned this issue Feb 9, 2023

[transform] add disc_linalg_ext.conditional_generic op #1007

Merged

This was referenced Feb 27, 2023

[transform] add a default schedule for epilogue fused gemm with large k #1051

Merged

[transform] some optimization for split-k gemm epilogue fusion #1057

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce MLIR transform dialect to BladeDISC #787

Introduce MLIR transform dialect to BladeDISC #787

wyzero commented Nov 24, 2022 •

edited

Loading

wyzero commented Dec 29, 2022

wyzero commented Mar 23, 2023

Introduce MLIR transform dialect to BladeDISC #787

Introduce MLIR transform dialect to BladeDISC #787

Comments

wyzero commented Nov 24, 2022 • edited Loading

wyzero commented Dec 29, 2022

wyzero commented Mar 23, 2023

wyzero commented Nov 24, 2022 •

edited

Loading