Low hanging fruit optimizations in VMLA kernels #3601
Labels
performance ⚡
Performance/optimization related work across the compiler and runtime
runtime
Relating to the IREE runtime library
A few of the reference kernels in iree/hal/vmla/op_kernels_generic.h have particularly poor performance. While we expect the LLVM ahead-of-time backend to be more viable as a deployment target, having a faster reference backend is still generally useful.
Some of the slow kernels are already labeled:
https://github.com/google/iree/blob/2edc7d648b4e5a352a055666e17f58587a0a6ad6/iree/hal/vmla/op_kernels_generic.h#L238-L241
https://github.com/google/iree/blob/2edc7d648b4e5a352a055666e17f58587a0a6ad6/iree/hal/vmla/op_kernels_generic.h#L576-L578
https://github.com/google/iree/blob/2edc7d648b4e5a352a055666e17f58587a0a6ad6/iree/hal/vmla/op_kernels_generic.h#L498-L501
Profiling IREE with Tracy on a representative model shows in much, much more detail which kernels are being called frequently and which are taking large chunks of time. Our focus for 2020Q4 is the MobileBert model at https://github.com/google/iree/blob/main/iree/test/e2e/models/bert_encoder_unrolled_fake_weights.mlir (TODO: link to real weights / iree-translate compatible file):
We don't need to jump straight to building something like ruy for these kernels (we already use ruy for matmul), but there are many easy optimizations to make without sacrificing readability. Switching from
absl::InlinedVector
tostd::vector
or C arrays is one example.The text was updated successfully, but these errors were encountered: