-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid unnecessary padding in iree-flow-set-encoding
#11632
Labels
codegen/llvm
LLVM code generation compiler backend
codegen/vmvx
VMVX (IREE VM) code generation compiler backend
performance ⚡
Performance/optimization related work across the compiler and runtime
Comments
bjacob
added
codegen/llvm
LLVM code generation compiler backend
codegen/vmvx
VMVX (IREE VM) code generation compiler backend
performance ⚡
Performance/optimization related work across the compiler and runtime
labels
Dec 21, 2022
I would need @benvanik s suggestions on it. I am passing this to him for the time being |
bjacob
added a commit
that referenced
this issue
Jan 12, 2023
This is part 3/9 of #11755: Finish the transition to `--iree-flow-enable-data-tiling`. Currently (unless the defaultPadding is changed to a non-default value), dynamic-size set_encoding shapes get rounded up to the next multiple of 16. This causes padding by potentially more than one tile size. This invalidates an assumption that the `iree_uk_pack` ukernel was relying on, that at most one last tile along each dimension would require padding. I wish the assumption could resume holding. This is tracked in Issue #11632.
jpienaar
pushed a commit
that referenced
this issue
May 1, 2023
This is part 3/9 of #11755: Finish the transition to `--iree-flow-enable-data-tiling`. Currently (unless the defaultPadding is changed to a non-default value), dynamic-size set_encoding shapes get rounded up to the next multiple of 16. This causes padding by potentially more than one tile size. This invalidates an assumption that the `iree_uk_pack` ukernel was relying on, that at most one last tile along each dimension would require padding. I wish the assumption could resume holding. This is tracked in Issue #11632.
bjacob
changed the title
Explore ways to remove
Avoid unnecessary padding in Jul 7, 2023
defaultPadding
(current default value 16) in iree-flow-set-encoding
iree-flow-set-encoding
Tentative fix: #14349 |
bjacob
added a commit
that referenced
this issue
Jul 11, 2023
This has long been discussed, as the enum was just an initial design shortcut, but the number of enum cases was already getting uncomfortable due to the multiple dimensions there. This is a step in a chain towards fixing #11632. The reason is that in order to properly specialize for narrow matmul cases in `MaterializeEncoding`, selecting adequately narrow matmul kernels that avoid widening the entire matmul problem at hand, we will need to know there the original (pre-padding) matrix shape. Since `MaterializeEncoding` is a type-conversion, not just an ordinary rewrite pattern, this information will need to be encoded in types --- we won't just be able to walk from a value to its defining op to find the pre-padding value, there just aren't values there. So I will want to add the pre-padding shape (or type) to EncodingAttr. This is a step towards that: by making EncodingAttr a data structure, it's easy then to add another field. By contrast, if it's still an enum, the combinatorics get out of hand.
bjacob
added a commit
that referenced
this issue
Jul 17, 2023
…ze choice to MaterializeEncoding. (#14349) This fixes #11632, by introducing a materializable `upper_bound_tile_size ` instead of hardcoding a fixed padding amount at Flow, and fixes it in sufficient generality to also solve the problem for narrow matmuls - let's explain that in more detail as this is an important part of what this PR is doing. For each combination of element types and each target, the MaterializeEncoding pass selects appropriate matmul tile shapes. Input tensors get padded to the next multiple of the next tile size. The padding increases the inherent arithmetic cost of the problem at hand. When, along some dimension, the original tensor size is smaller than the tile size, that can result in particulary large overhead. The extreme case, which is also a very common case, is matrix-times-vector multiplication. The "vector" right-hand side is really a matrix with one dimension size equal to 1, so if the general matmul tile shape along that dimension is 8 or 16, as is usually the case, that can be a 8x or 16x increase in the inherent arithmetic cost of the matmul op. The solution to that is to adjust MaterializeEncoding tile shapes to narrow dimensions. We had some logic in place to deal with that, but #11632 was leaving it moot: the flow-level padding of everything to the next multiple of 16 meant that our logic there never really had a chance of kicking in. With #11632 being fixed, this PR was the opportunity to also fix that along the way, and to ensure that the solution to #11632 worked also in that respect. As matrix-times-vector products were the common case that suffered the most from #11632, it would have been too bad to "solve" #11632 without addressing that. By the way, matrix-times-vector is only the extreme case, but other narrow cases matter too. When, e.g. on AVX-512, the general matmul tile size is 16, even width-8 matmuls (MxKx8) were suffering from 2x-widening. So the solution in this PR is making sure to address all narrow cases, defined as whenever a tensor dimension size is less than the general tile size. The difficulty was that when MaterializeEncoding runs on a dispatch function, it runs on an already-padded tensor; even as this PR introduces `upper_bound_tile_size`, that only makes it possible to select the right padding amount, but there's still a `tensor.pad` op and it's still getting in the way of knowing the actual, original tensor shape for the purpose of adjusting tile shapes for narrow cases. Moreover, as `MaterializeEncoding` is a type-converter pass, it can't just walk from a Value up to its defining-op to find the pre-padding tensor. There are no values there, only types. So the information about the pre-padding tensor shape has to be part of the tensor type that `MaterializeEncoding` sees, that its, the padded tensor type. The solution to that problem in this PR is to add a `original_type` field to `EncodingAttr`. Fixes #11632. Fixes a compiler issue encountered in #14398 but not the originally reported runtime crash by itself. This now also includes the removal of a now-useless VMVX pass, which was originally split into #14383 .
nhasabni
pushed a commit
to plaidml/iree
that referenced
this issue
Aug 24, 2023
This has long been discussed, as the enum was just an initial design shortcut, but the number of enum cases was already getting uncomfortable due to the multiple dimensions there. This is a step in a chain towards fixing iree-org#11632. The reason is that in order to properly specialize for narrow matmul cases in `MaterializeEncoding`, selecting adequately narrow matmul kernels that avoid widening the entire matmul problem at hand, we will need to know there the original (pre-padding) matrix shape. Since `MaterializeEncoding` is a type-conversion, not just an ordinary rewrite pattern, this information will need to be encoded in types --- we won't just be able to walk from a value to its defining op to find the pre-padding value, there just aren't values there. So I will want to add the pre-padding shape (or type) to EncodingAttr. This is a step towards that: by making EncodingAttr a data structure, it's easy then to add another field. By contrast, if it's still an enum, the combinatorics get out of hand.
nhasabni
pushed a commit
to plaidml/iree
that referenced
this issue
Aug 24, 2023
…ze choice to MaterializeEncoding. (iree-org#14349) This fixes iree-org#11632, by introducing a materializable `upper_bound_tile_size ` instead of hardcoding a fixed padding amount at Flow, and fixes it in sufficient generality to also solve the problem for narrow matmuls - let's explain that in more detail as this is an important part of what this PR is doing. For each combination of element types and each target, the MaterializeEncoding pass selects appropriate matmul tile shapes. Input tensors get padded to the next multiple of the next tile size. The padding increases the inherent arithmetic cost of the problem at hand. When, along some dimension, the original tensor size is smaller than the tile size, that can result in particulary large overhead. The extreme case, which is also a very common case, is matrix-times-vector multiplication. The "vector" right-hand side is really a matrix with one dimension size equal to 1, so if the general matmul tile shape along that dimension is 8 or 16, as is usually the case, that can be a 8x or 16x increase in the inherent arithmetic cost of the matmul op. The solution to that is to adjust MaterializeEncoding tile shapes to narrow dimensions. We had some logic in place to deal with that, but next multiple of 16 meant that our logic there never really had a chance of kicking in. With iree-org#11632 being fixed, this PR was the opportunity to also fix that along the way, and to ensure that the solution to iree-org#11632 worked also in that respect. As matrix-times-vector products were the common case that suffered the most from iree-org#11632, it would have been too bad to "solve" iree-org#11632 without addressing that. By the way, matrix-times-vector is only the extreme case, but other narrow cases matter too. When, e.g. on AVX-512, the general matmul tile size is 16, even width-8 matmuls (MxKx8) were suffering from 2x-widening. So the solution in this PR is making sure to address all narrow cases, defined as whenever a tensor dimension size is less than the general tile size. The difficulty was that when MaterializeEncoding runs on a dispatch function, it runs on an already-padded tensor; even as this PR introduces `upper_bound_tile_size`, that only makes it possible to select the right padding amount, but there's still a `tensor.pad` op and it's still getting in the way of knowing the actual, original tensor shape for the purpose of adjusting tile shapes for narrow cases. Moreover, as `MaterializeEncoding` is a type-converter pass, it can't just walk from a Value up to its defining-op to find the pre-padding tensor. There are no values there, only types. So the information about the pre-padding tensor shape has to be part of the tensor type that `MaterializeEncoding` sees, that its, the padded tensor type. The solution to that problem in this PR is to add a `original_type` field to `EncodingAttr`. Fixes iree-org#11632. Fixes a compiler issue encountered in iree-org#14398 but not the originally reported runtime crash by itself. This now also includes the removal of a now-useless VMVX pass, which was originally split into iree-org#14383 .
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
codegen/llvm
LLVM code generation compiler backend
codegen/vmvx
VMVX (IREE VM) code generation compiler backend
performance ⚡
Performance/optimization related work across the compiler and runtime
(Updated July 2023)
When data-tiling, during Flow we don't know yet the exact tile sizes, so we don't know yet exactly how much padding is needed to ensure that buffers are allocated large enough to account for padding to the next multiple of what will be the actual tile size.
The current code is simply using a hardcoded tile size of 16, along every dimension, in Flow.
Problems with that:
float16
, even AVX-512 or ARM SVE or RISC-V RVV can already defeat that with tile sizes equal to 32. So resolving that blocks being able to target all these near-team SIMD targets.SetEncoding
pass to be parametrized in adefaultPadding
option, while it could otherwise be unparametrized.The text was updated successfully, but these errors were encountered: