Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Generation for Matrix Multiplication #653

Merged
merged 41 commits into from
Apr 30, 2024

Conversation

resting-dove
Copy link
Contributor

@resting-dove resting-dove commented Jan 25, 2024

Work for the project in "Large Scale Data Engineering" at TU Berlin WS 23/ 24. The project is supervised by @philipportner. The issue is #627.

The work mainly concerns an extension to the MatMulLoweringPass of Daphne.

Several command line options are introduced:
--mlir-codegen lowers to the naive structure with three loops.
--matmul-tile attempts to find tile sizes such that Register, L1, L2 and L3 caches are reused more efficiently.
--matmul-fixed-tile-sizes=4,4 can be used to specify up to 5 tile sizes to be used in a tiling scheme adapted from https://github.com/bondhugula/llvm-project/blob/hop/mlir/docs/HighPerfCodeGen.md.
matmul-unroll-factor=2 additionally unrolls the inner most loop in the tiling scheme by up to the specified factor.
matmul-unroll-jam-factor=4 additionally unroll jams the original inner most two loops of the tiled nest, if the tile sizes divide the loop size.
--matmul-vec-size-bits=64 attempts to use vector instructions with the specified bitwidth, but at least one element. Is only executed, if the resulting vector size divides the matrix size.
--matmul-num-vec-registers=16gives the number of vector registers available for automatic tile size calculation.

Automated test are added under test/api/cli/codegen.

Evaluations of the options are reported in https://github.com/resting-dove/lde_project.

@philipportner philipportner added the LDE winter 2023/24 Student project in the course Large-scale Data Engineering at TU Berlin (winter 2023/24). label Jan 29, 2024
@resting-dove resting-dove marked this pull request as ready for review February 4, 2024 21:08
Copy link
Collaborator

@philipportner philipportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good work @resting-dove ! A few small things to iron out but nothing major. You did a really good and thorough job improving our matrix multiplication codegen.

src/compiler/execution/DaphneIrExecutor.cpp Outdated Show resolved Hide resolved
Comment on lines 329 to 330
mlir::LowerVectorToLLVMOptions lowerVectorToLLVMOptions;
lowerVectorToLLVMOptions.enableX86Vector(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this run on an ARM system? You may want to check and use lowerVectorToLLVMOptions.enableArmSVE(); instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could tell, it didn't actually produce any x86 specific instructions so I removed it.

bool vectorize{false};
bool tile{false};
bool useFixedTileSizes{false};
int register_size{4 * 4 * 64};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which register size is this referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to really understand what Uday Bondhugula was referring to with this in the blog post. It seems to mean the SIMD vector size x how many vectors there are. I changed the variable name to something that should make this more clear.

src/compiler/lowering/MatMulOpLowering.cpp Outdated Show resolved Hide resolved
Type elementType, int64_t vec_size) {
auto vec_Type = mlir::VectorType::get({vec_size}, elementType);

// TODO: We need an option to enable smaller vector sizes for the ends of each
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO, in case this is out of scope for this PR, maybe it is wort it to create an issue with more explanation and reasoning for the suggested change and reference the issue here as TODO(#xxx)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a PR for this.

Comment on lines 263 to 264
// TODO: Test why not
// Cannot tile non square matmul.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice to have, especially for your benchmarks. In case you run into too much troubles fixing this, handle it in the same way as the TODO above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work currently also for non square matrices. I will do some more testing.

Comment on lines 288 to 289
// Cannot lower integer matmul, because we cannot create Memrefs with
// integer type at the moment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer memrefs should generally work, see src/compiler/lowering/EwOpsLowering.cpp. Handle this as already discussed in our meeting, so similarly to the TODO above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original problem was three fold. 1. Filling the output memref with correctly signed 0 integer values, when we so far only used arith::ConstantOps which are MLIR unsigned. 2. Similarly arith::AddIOp only operates on signless types. 3. SumAllOp could not handle integer types for the same reasons.
In EwOps this is handled by casting from signfull to signless where necessary. I took this approach and applied it to MatMulLowering and SumAllOpLowering.

Comment on lines 459 to 460
// TODO: This assert only fails in debug mode?!
// assert(isValidLoopInterchangePermutation(twiceTiledNest, {2, 0, 1, 4, 7,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the TODO above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check simply cannot deal with for loops that have more complex step lengths etc. Thus, I removed this check. The other checks should ensure we know where our loops are and therefore that this is a valid interchange.

src/compiler/lowering/MatMulOpLowering.cpp Outdated Show resolved Hide resolved
let dependentDialects = ["vector::VectorDialect", "mlir::LLVM::LLVMDialect", "mlir::AffineDialect",
"mlir::memref::MemRefDialect"];
let constructor = "mlir::daphne::createMatMulOpLoweringPass()";
let description = [{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good and thorough description!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@resting-dove resting-dove changed the title WIP: Code Generation for Matrix Multiplication Code Generation for Matrix Multiplication Feb 26, 2024
@philipportner philipportner merged commit 69f44e5 into daphne-eu:main Apr 30, 2024
2 checks passed
@philipportner
Copy link
Collaborator

Thanks again @resting-dove , took a while but the changes are now merged in. Great work improving and testing optimized code generation for matrix multiplication!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LDE winter 2023/24 Student project in the course Large-scale Data Engineering at TU Berlin (winter 2023/24).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants