Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problem bug in MobileBert with vectorization enable. #5211

Merged
merged 2 commits into from
Mar 23, 2021

Conversation

ThomasRaoux
Copy link
Contributor

No description provided.

if (!op.getInputIndexingMap(i).isMinorIdentity()) return success();
}
if (!vectorize || outputShape.getShape().back() % lowerTs != 0)
return success();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to explicitly set config.vectorize to false before returning here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly

It is false by default. Do you mean doing it for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yup, it also helps when later we switch vectorization to be on by default.

@ThomasRaoux ThomasRaoux merged commit f5804ec into iree-org:main Mar 23, 2021
@hanhanW
Copy link
Contributor

hanhanW commented Mar 24, 2021

It looks like this improves the performance of MobileNetV2 on S20...

https://mako.dev/run?run_key=6441876564475904&~cpu=1&~vmla=1&~vlk=1

@antiagainst could you help verify if ToT does have 15ms performance? I'm hitting some weirdness on S20 in the lab, and I don't have one on local side.

@hanhanW
Copy link
Contributor

hanhanW commented Mar 24, 2021

I tested on S10 and I feel that there are issues in benchmarking infrastructure...

@hanhanW
Copy link
Contributor

hanhanW commented Mar 24, 2021

The variance is bug. There are 8 iterations in one benchmark and the range is 64 ms ~ 90 ms. We probably want to ping GPU frequency for lab phones.

I still feel that there is a performance improvement, so it'd be good if you can help check it.

@ScottTodd ScottTodd mentioned this pull request Mar 25, 2021
copybara-service bot pushed a commit that referenced this pull request Mar 25, 2021
* da64c93 Integrate MLIR-EmitC at iml130/mlir-emitc@dde739f (#5228)
* a8d6c2a Fix doc publication by escaping our IR snippets in markdown. (#5227)
* bd5e535 Introduce RVV VLS code-gen (#5199)
* d496bc9 Avoid allocating temporary buffer for tensors derived from read-only tensors (..
* 1a93557 Tidying up a few pages of documentation. (#5225)
* 46e8331 Add a dedicated iree_c_module CMake module (#5214)
* 603b208 Merge google -> main (#5219)
* c9f3742 Add support for control flow lowering in the VM to emitc target (#5208)
* c8a7b2f Sort descriptors as expected by the native module (#5207)
* 47183fb Revise compiler tracing to enable statistics aggregation. (#5218)
* 500ec7b Set author to match committer for llvm submodule update action. (#5217)
* 881de81 Generate and use the iree_hal_executable_library_t metadata. (#5195)
* cf8180e Increase K tile size for small matrices. (#5213)
* f5804ec Fix problem bug in MobileBert with vectorization enable. (#5211)

COPYBARA_INTEGRATE_REVIEW=#5229 from ScottTodd:main-to-google da64c93
PiperOrigin-RevId: 365076337
iree-github-actions-bot pushed a commit that referenced this pull request Mar 25, 2021
* da64c93 Integrate MLIR-EmitC at iml130/mlir-emitc@dde739f (#5228)
* a8d6c2a Fix doc publication by escaping our IR snippets in markdown. (#5227)
* bd5e535 Introduce RVV VLS code-gen (#5199)
* d496bc9 Avoid allocating temporary buffer for tensors derived from read-only tensors (..
* 1a93557 Tidying up a few pages of documentation. (#5225)
* 46e8331 Add a dedicated iree_c_module CMake module (#5214)
* 603b208 Merge google -> main (#5219)
* c9f3742 Add support for control flow lowering in the VM to emitc target (#5208)
* c8a7b2f Sort descriptors as expected by the native module (#5207)
* 47183fb Revise compiler tracing to enable statistics aggregation. (#5218)
* 500ec7b Set author to match committer for llvm submodule update action. (#5217)
* 881de81 Generate and use the iree_hal_executable_library_t metadata. (#5195)
* cf8180e Increase K tile size for small matrices. (#5213)
* f5804ec Fix problem bug in MobileBert with vectorization enable. (#5211)

PiperOrigin-RevId: 365076337
@antiagainst
Copy link
Contributor

@antiagainst could you help verify if ToT does have 15ms performance?

The ToT is still going through the existing buffer-based path. This PR actually fixed the broken connection to let the existing buffer-based path to be able to access known-good tiling configurations again (for both MobileBERT and MobileNet) so we should see an improvement. But I don't think we can achieve 15ms yet. It should be around 20ms. So looks like we are over optimistic here.

I think it's likely because we duplicate dispatches without inserting any barriers. That essentially means increasing the number of warps for each dispatch to help filling the GPU. After dividing by the repeat count, we are getting a better number than it should be. There are certain dispatches in both models that are just working on small scale problem size; increasing warp count hides and "improves" the latency for them. Instead of duplicating dispatches, I think we want to submit the command buffer multiple times and average at that level. It will help to keep the original dispatch structure intact.

@hanhanW
Copy link
Contributor

hanhanW commented Mar 30, 2021

I think it still affect the existing buffer-based path because some passes are used, like TileAndDistributeAmongWorkgroupsPass.

Yeah, good point... let me file an issue and move the discussion there. I think you are right, I will prototype on the idea and see what's going on. This does bother me for a while... Thank you sooooo much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants