Fix problem bug in MobileBert with vectorization enable. #5211

ThomasRaoux · 2021-03-23T20:27:23Z

No description provided.

antiagainst · 2021-03-23T20:41:22Z

iree/compiler/Conversion/LinalgToSPIRV/KernelDispatchUtils.cpp

-    if (!op.getInputIndexingMap(i).isMinorIdentity()) return success();
-  }
+  if (!vectorize || outputShape.getShape().back() % lowerTs != 0)
+    return success();


Don't we want to explicitly set config.vectorize to false before returning here?

explicitly

It is false by default. Do you mean doing it for clarity?

Thanks! Yup, it also helps when later we switch vectorization to be on by default.

hanhanW · 2021-03-24T09:05:29Z

It looks like this improves the performance of MobileNetV2 on S20...

https://mako.dev/run?run_key=6441876564475904&~cpu=1&~vmla=1&~vlk=1

@antiagainst could you help verify if ToT does have 15ms performance? I'm hitting some weirdness on S20 in the lab, and I don't have one on local side.

hanhanW · 2021-03-24T09:54:26Z

I tested on S10 and I feel that there are issues in benchmarking infrastructure...

hanhanW · 2021-03-24T10:02:41Z

The variance is bug. There are 8 iterations in one benchmark and the range is 64 ms ~ 90 ms. We probably want to ping GPU frequency for lab phones.

I still feel that there is a performance improvement, so it'd be good if you can help check it.

* da64c93 Integrate MLIR-EmitC at iml130/mlir-emitc@dde739f (#5228) * a8d6c2a Fix doc publication by escaping our IR snippets in markdown. (#5227) * bd5e535 Introduce RVV VLS code-gen (#5199) * d496bc9 Avoid allocating temporary buffer for tensors derived from read-only tensors (.. * 1a93557 Tidying up a few pages of documentation. (#5225) * 46e8331 Add a dedicated iree_c_module CMake module (#5214) * 603b208 Merge google -> main (#5219) * c9f3742 Add support for control flow lowering in the VM to emitc target (#5208) * c8a7b2f Sort descriptors as expected by the native module (#5207) * 47183fb Revise compiler tracing to enable statistics aggregation. (#5218) * 500ec7b Set author to match committer for llvm submodule update action. (#5217) * 881de81 Generate and use the iree_hal_executable_library_t metadata. (#5195) * cf8180e Increase K tile size for small matrices. (#5213) * f5804ec Fix problem bug in MobileBert with vectorization enable. (#5211) COPYBARA_INTEGRATE_REVIEW=#5229 from ScottTodd:main-to-google da64c93 PiperOrigin-RevId: 365076337

* da64c93 Integrate MLIR-EmitC at iml130/mlir-emitc@dde739f (#5228) * a8d6c2a Fix doc publication by escaping our IR snippets in markdown. (#5227) * bd5e535 Introduce RVV VLS code-gen (#5199) * d496bc9 Avoid allocating temporary buffer for tensors derived from read-only tensors (.. * 1a93557 Tidying up a few pages of documentation. (#5225) * 46e8331 Add a dedicated iree_c_module CMake module (#5214) * 603b208 Merge google -> main (#5219) * c9f3742 Add support for control flow lowering in the VM to emitc target (#5208) * c8a7b2f Sort descriptors as expected by the native module (#5207) * 47183fb Revise compiler tracing to enable statistics aggregation. (#5218) * 500ec7b Set author to match committer for llvm submodule update action. (#5217) * 881de81 Generate and use the iree_hal_executable_library_t metadata. (#5195) * cf8180e Increase K tile size for small matrices. (#5213) * f5804ec Fix problem bug in MobileBert with vectorization enable. (#5211) PiperOrigin-RevId: 365076337

antiagainst · 2021-03-30T13:53:18Z

@antiagainst could you help verify if ToT does have 15ms performance?

The ToT is still going through the existing buffer-based path. This PR actually fixed the broken connection to let the existing buffer-based path to be able to access known-good tiling configurations again (for both MobileBERT and MobileNet) so we should see an improvement. But I don't think we can achieve 15ms yet. It should be around 20ms. So looks like we are over optimistic here.

I think it's likely because we duplicate dispatches without inserting any barriers. That essentially means increasing the number of warps for each dispatch to help filling the GPU. After dividing by the repeat count, we are getting a better number than it should be. There are certain dispatches in both models that are just working on small scale problem size; increasing warp count hides and "improves" the latency for them. Instead of duplicating dispatches, I think we want to submit the command buffer multiple times and average at that level. It will help to keep the original dispatch structure intact.

hanhanW · 2021-03-30T14:26:49Z

I think it still affect the existing buffer-based path because some passes are used, like TileAndDistributeAmongWorkgroupsPass.

Yeah, good point... let me file an issue and move the discussion there. I think you are right, I will prototype on the idea and see what's going on. This does bother me for a while... Thank you sooooo much!

google-cla bot added the cla: yes label Mar 23, 2021

ThomasRaoux requested a review from antiagainst March 23, 2021 20:27

Fix problem bug in MobileBert with vectorization enable.

21b0071

ThomasRaoux force-pushed the fix_bert branch from b576e20 to 21b0071 Compare March 23, 2021 20:27

antiagainst reviewed Mar 23, 2021

View reviewed changes

set config.vectorize explicitly

092308e

ThomasRaoux requested a review from antiagainst March 23, 2021 21:34

antiagainst approved these changes Mar 23, 2021

View reviewed changes

ThomasRaoux merged commit f5804ec into iree-org:main Mar 23, 2021

ScottTodd mentioned this pull request Mar 25, 2021

Merge main -> google #5229

Merged

antiagainst mentioned this pull request Mar 30, 2021

Add mobile-bert f16 version to Mako benchmark. #5204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix problem bug in MobileBert with vectorization enable. #5211

Fix problem bug in MobileBert with vectorization enable. #5211

ThomasRaoux commented Mar 23, 2021

antiagainst Mar 23, 2021

ThomasRaoux Mar 23, 2021

ThomasRaoux Mar 23, 2021

antiagainst Mar 23, 2021

hanhanW commented Mar 24, 2021

hanhanW commented Mar 24, 2021

hanhanW commented Mar 24, 2021

antiagainst commented Mar 30, 2021

hanhanW commented Mar 30, 2021

Fix problem bug in MobileBert with vectorization enable. #5211

Fix problem bug in MobileBert with vectorization enable. #5211

Conversation

ThomasRaoux commented Mar 23, 2021

antiagainst Mar 23, 2021

Choose a reason for hiding this comment

ThomasRaoux Mar 23, 2021

Choose a reason for hiding this comment

ThomasRaoux Mar 23, 2021

Choose a reason for hiding this comment

antiagainst Mar 23, 2021

Choose a reason for hiding this comment

hanhanW commented Mar 24, 2021

hanhanW commented Mar 24, 2021

hanhanW commented Mar 24, 2021

antiagainst commented Mar 30, 2021

hanhanW commented Mar 30, 2021