New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Code Generation for Matrix Multiplication #653

Merged

philipportner merged 41 commits into daphne-eu:main from resting-dove:submission

Apr 30, 2024

Contributor

resting-dove commented Jan 25, 2024 •

edited

Loading

Work for the project in "Large Scale Data Engineering" at TU Berlin WS 23/ 24. The project is supervised by @philipportner. The issue is #627.

The work mainly concerns an extension to the MatMulLoweringPass of Daphne.

Several command line options are introduced:
--mlir-codegen lowers to the naive structure with three loops.
--matmul-tile attempts to find tile sizes such that Register, L1, L2 and L3 caches are reused more efficiently.
--matmul-fixed-tile-sizes=4,4 can be used to specify up to 5 tile sizes to be used in a tiling scheme adapted from https://github.com/bondhugula/llvm-project/blob/hop/mlir/docs/HighPerfCodeGen.md.
matmul-unroll-factor=2 additionally unrolls the inner most loop in the tiling scheme by up to the specified factor.
matmul-unroll-jam-factor=4 additionally unroll jams the original inner most two loops of the tiled nest, if the tile sizes divide the loop size.
--matmul-vec-size-bits=64 attempts to use vector instructions with the specified bitwidth, but at least one element. Is only executed, if the resulting vector size divides the matrix size.
--matmul-num-vec-registers=16gives the number of vector registers available for automatic tile size calculation.

Automated test are added under test/api/cli/codegen.

Evaluations of the options are reported in https://github.com/resting-dove/lde_project.

resting-dove force-pushed the submission branch from be2faf3 to 01ba2e7 Compare

January 25, 2024 00:38

philipportner added the LDE winter 2023/24 label

resting-dove force-pushed the submission branch from 449c373 to 2f89b1a Compare

February 4, 2024 20:41

resting-dove marked this pull request as ready for review

February 4, 2024 21:08

philipportner requested changes

View reviewed changes

Collaborator

philipportner left a comment

Very good work @resting-dove ! A few small things to iron out but nothing major. You did a really good and thorough job improving our matrix multiplication codegen.

src/compiler/execution/DaphneIrExecutor.cpp Outdated Show resolved Hide resolved

src/compiler/execution/DaphneIrExecutor.cpp Outdated

Comment on lines 329 to 330

		mlir::LowerVectorToLLVMOptions lowerVectorToLLVMOptions;
		lowerVectorToLLVMOptions.enableX86Vector(true);

Collaborator

philipportner Feb 14, 2024

What happens if this run on an ARM system? You may want to check and use lowerVectorToLLVMOptions.enableArmSVE(); instead.

Contributor Author

resting-dove Feb 19, 2024

As far as I could tell, it didn't actually produce any x86 specific instructions so I removed it.

src/compiler/lowering/MatMulOpLowering.cpp Outdated

+                bool vectorize{false};
+                bool tile{false};
+                bool useFixedTileSizes{false};
+                int register_size{4 * 4 * 64};

Collaborator

philipportner Feb 14, 2024

Which register size is this referring to?

Contributor Author

resting-dove Feb 19, 2024

It took me a while to really understand what Uday Bondhugula was referring to with this in the blog post. It seems to mean the SIMD vector size x how many vectors there are. I changed the variable name to something that should make this more clear.

src/compiler/lowering/MatMulOpLowering.cpp Outdated Show resolved Hide resolved

src/compiler/lowering/MatMulOpLowering.cpp Outdated

+                                     Type elementType, int64_t vec_size) {
+                auto vec_Type = mlir::VectorType::get({vec_size}, elementType);
+                // TODO: We need an option to enable smaller vector sizes for the ends of each

Collaborator

philipportner Feb 14, 2024

TODO, in case this is out of scope for this PR, maybe it is wort it to create an issue with more explanation and reasoning for the suggested change and reference the issue here as TODO(#xxx)

Contributor Author

resting-dove Feb 19, 2024

I will create a PR for this.

src/compiler/lowering/MatMulOpLowering.cpp Outdated

Comment on lines 263 to 264

		// TODO: Test why not
		// Cannot tile non square matmul.

Collaborator

philipportner Feb 14, 2024

This would be nice to have, especially for your benchmarks. In case you run into too much troubles fixing this, handle it in the same way as the TODO above.

Contributor Author

resting-dove Feb 19, 2024

Seems to work currently also for non square matrices. I will do some more testing.

src/compiler/lowering/MatMulOpLowering.cpp Outdated

Comment on lines 288 to 289

		// Cannot lower integer matmul, because we cannot create Memrefs with
		// integer type at the moment.

Collaborator

philipportner Feb 14, 2024

Integer memrefs should generally work, see src/compiler/lowering/EwOpsLowering.cpp. Handle this as already discussed in our meeting, so similarly to the TODO above.

Contributor Author

resting-dove Feb 19, 2024

The original problem was three fold. 1. Filling the output memref with correctly signed 0 integer values, when we so far only used arith::ConstantOps which are MLIR unsigned. 2. Similarly arith::AddIOp only operates on signless types. 3. SumAllOp could not handle integer types for the same reasons.
In EwOps this is handled by casting from signfull to signless where necessary. I took this approach and applied it to MatMulLowering and SumAllOpLowering.

src/compiler/lowering/MatMulOpLowering.cpp Outdated

Comment on lines 459 to 460

		// TODO: This assert only fails in debug mode?!
		// assert(isValidLoopInterchangePermutation(twiceTiledNest, {2, 0, 1, 4, 7,

Collaborator

philipportner Feb 14, 2024

Same as the TODO above.

Contributor Author

resting-dove Feb 19, 2024

I think this check simply cannot deal with for loops that have more complex step lengths etc. Thus, I removed this check. The other checks should ensure we know where our loops are and therefore that this is a valid interchange.

src/compiler/lowering/MatMulOpLowering.cpp Outdated Show resolved Hide resolved

src/ir/daphneir/Passes.td

+                let dependentDialects = ["vector::VectorDialect", "mlir::LLVM::LLVMDialect", "mlir::AffineDialect",
+                                          "mlir::memref::MemRefDialect"];
+                let constructor = "mlir::daphne::createMatMulOpLoweringPass()";
+                let description = [{

Collaborator

philipportner Feb 14, 2024

Very good and thorough description!

Contributor Author

resting-dove Feb 19, 2024

Thanks!

resting-dove changed the title ~~WIP: Code Generation for Matrix Multiplication~~ Code Generation for Matrix Multiplication

resting-dove mentioned this pull request

Missing code generation for sum of integer matrices #663

Open

resting-dove mentioned this pull request

Add AggAll kernels for missing value types #674

Merged

resting-dove added 22 commits

April 30, 2024 12:31


          Adaptations to build code on my laptop

aa65b45


          matmul tiled as blog

8624b6f


          Set optimizer level 3

1a6fefb


          Hand vectorized the matmul operation


          SumAll could not be lowered for single floats. Thus rely on daphne ke…

ac73661

…rnels instead of CodeGen, except for MatMul


          Only lower MatMul, when size fits to the vector size

0be2141


          Get vec_size for Matmul from UserConfig.json

a2ac409


          Multiple options for MatMul Lowerings

c1528eb


          Add Lowering options and corresponding ifs inside the match and rewri…

68e7718

…te pattern


          Added Matmul command line options

f896955


          Vector size depending on bit width of matrix type

acba555


          distinguish between fixed and adaptable tile sizes

1749e6b


          Add command line option for fixed tile sizes

ddc0fff


          Some Pass always destroyed the affine loop nest


          Get cache sizes programmatically

0dcbc6c


          Added unit some unit tests under the cli directory

7dc051c


          Enable lower-mm in daphne-opt

8b8155b


          Lit tests for tiled matmul lowering

aa4177a


          Unit tests for Vectorization

8167f02


          Unit tests for Matmul accuracy

928c76a


          MatmulAccuracyTest no longer prints for every entry in output

3250ce8


          Ad separate unroll-jam-factor option

725e69b

resting-dove and others added 19 commits

April 30, 2024 12:31


          Recreated matmul tile tests with current tiling strategy

62213ac


          Remove adaptations for building on my laptop

3b67afd


          Formatted the MatMulLowering

b4f72aa


          removed reference to x86vector dialect since no effect

a9a7636


          removed fma comment and prepared for reenabling int types

70e82e0


          remove unnecessary scope

8ab1b2b


          affine valid loop interchange checking seems to not support loops wit…

a689d53

…h modulo


          Enable sum() for integer matrices

9dfb83c


          Fill Memrefs with signfull integers of various types

b23f02c


          SumAgg tests for integer types

e682dc4


          Enable Matmul lowering for integer valued matrices

861581f


          Change from abstract register_size option to more clear num_vec_regis…

a4bde1b

…ters


          non square matmul tests

b528810


          Update documentation

6dc4dfa


          Unroll before loop promotion caused segfaults

78e9047


          Added tests for previously breaking sumAll and unroll

6259fef


          SumAll handles single value type

8b6f33a


          Invert loops as fall back option

1b9b935


          [minor] warnings, remove code duplication

487745b

philipportner force-pushed the submission branch from 52bcd64 to 487745b Compare

April 30, 2024 11:21

philipportner merged commit 69f44e5 into daphne-eu:main

2 checks passed

Collaborator

philipportner commented Apr 30, 2024

Thanks again @resting-dove , took a while but the changes are now merged in. Great work improving and testing optimized code generation for matrix multiplication!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LDE winter 2023/24