Implementing Vulkan dispatch tracing. #5287

benvanik · 2021-04-01T18:53:05Z

This is heavily inspired by the upstream TracyVulkan.hpp and
https://nikitablack.github.io/post/how_to_use_vulkan_timestamp_queries/.
Significant reworking was done to better support incremental collection,
use host query reset when available (extension or vulkan 1.2), and use
external source locations so we can provide the original executable
information.

This is enabled by default when IREE tracing is enabled but can be turned
off with --vulkan_tracing=false.

Only tested on Windows - there may be some timestamp mapping required on
linux/android where VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT is present. There
are likely corner cases with exhausted buffers but since this is a
debug-only codepath the worst that will happen is that the trace gets
corrupted.

Each Vulkan queue will appear as a new GPU zone in Tracy:

And dispatches will use their original MLIR entry point function names:

This is heavily inspired by the upstream TracyVulkan.hpp and https://nikitablack.github.io/post/how_to_use_vulkan_timestamp_queries/. Significant reworking was done to better support incremental collection, use host query reset when available (extension or vulkan 1.2), and use external source locations so we can provide the original executable information. This is enabled by default when IREE tracing is enabled but can be turned off with `--vulkan_tracing=false`. Only tested on Windows - there may be some timestamp mapping required on linux/android where VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT is present. There are likely corner cases with exhausted buffers but since this is a debug-only codepath the worst that will happen is that the trace gets corrupted.

ScottTodd · 2021-04-01T22:44:27Z

iree/hal/vulkan/tracing.h

+extern "C" {
+#endif  // __cplusplus
+
+// Per-queue Vulkan tracing context.


This whole comment block is great, thank you! Lots of detail in the TODOs too :D

ScottTodd · 2021-04-01T22:57:39Z

iree/hal/vulkan/tracing.h

+// NOTE: timestamps have non-trivial side-effecting behavior on the device:
+// inserting a timestamp is in the worst (and average) case just as bad as
+// inserting a full global execution barrier. If two command buffer operations
+// that could overlap (no barrier between them) have tracing zones placed around
+// them they will execute sequentially.


Are there any additional concerns about the volume of trace data being collected here, compared to baseline tracy usage? I.e. Should developers be more careful about long-running traces? (I'm thinking of cases like with WTF where if you enabled WebGL tracing and included texture/buffer data the files could get much larger than if they just contained collected metrics)

Were we ever considering turning on tracing in release builds? Maybe that was on a different project, and with WTF.

The GPU zone data is actually pretty well packed - more so than normal zones - as it uses delta compression. There's much more data generated on the CPU side (where we trace per tile). The real worry here is just the side-effecting behavior of inserting the timestamps.

I think you mean "production" builds (vs release/relwithdebinfo build configurations) - nope, tracy is definitely not built for that.

* 6bd5658 Merge google -> main (#5319) * 2e5257d Merge branch 'main' into google-to-main * 6936ee7 Patch VMLA performance by reserving vector size before pushing to it. (#5316) * f2f0041 NFC: Cleanup ConcretizeTileAmongstWorkgroupsPass. (#5297) * f96726a Add tests to run few other (smaller) models with Linalg on tensors path. (#5306) * fd64070 Revert "Add wasm-micro-runtime submodule and get building with CMake." (#5312) * ce0285f Continue pruning abseil usage: switch from absl::InlinedVector to std::vector... * 71e24b6 Removing hal.buffer.fill and hal.buffer.copy. (#5307) * 3c611d3 Add Mako benchmark config template file. (#5200) * 4d1a394 Fix RFFT bugs in VMLA. (#5308) * 0d55c95 Add configure_bazel.py step to TensorFlow getting started doc. * 1386d2c Switch simple_embedding_test to include drivers explicitly. (#5304) * 402550b Add StripAsserts pass and handle tf.Identity ops on tensor lists. (#5294) * fbdb4ef Add new metrics to MobileNetV2 benchmarks. (#5301) * 99c8eac Implementing Vulkan dispatch tracing. (#5287) * 2681dff Insert clones prior to mutation and not where it originates. (#5292) * aeafd9e Fix CUDA HAL bug and enable more execution tests (#5296) * 2801780 [CUDA Codegen] Enable tiling and vectorization for MatMulOp (#5293) * c61fefe Extend AffineMin canonicalization to support scf.parallel (#5289) * e0ee3f3 Add directory for microbenchmarking (#5260) * b8da32c Set wasm-export-name attributes on exported functions again. (#5286) * e2a2f81 Canonicalize affine min before applying tile-and-vecotrize passes (#5285) * 23861f7 [CUDA codegen] add vectorization infrastructure (#5278) * 6f443c4 Drop deps on Abseil's core_headers, synchronization, macros. (#5275) * e5b9e8a Actually run MobileNet with fake weights to check correctness (#5284) * e56db9a Remove dead code in LinalgToSPIRV (#5281) * 8863aa1 [NFC] Fix typos in variable names. (#5279) * 9cd93ba Turn vectorization on by default for linalg on tensors path (#5280) * 894dac6 Merge google -> main #5276 * b738162 Changing HAL dialect syntax to express all types. (#5239) * 1ba4e88 Merge branch 'main' into google-to-main * 531c73e Fix yml syntax (#5274) * 494fe32 Bumping the tracy version to 0.7.7 (WIP). (#5272) * 3616323 Disable Vulkan float16 tests on Pixel4 (#5273) * ade7ff1 Disable running BERT on Vulkan (see Issue #5268) (#5269) * 25ddc10 Add tracing to allocations made from VMA. (#5271) * df454f4 Changing iree_vm_list_resize to grow by 2x. (#5270) * bd9a113 Adding command buffer queue affinity. (#5265) * de834ae Make status matcher print the message when it fails. (#5266) * 10f5eaf Add f16 e2e tests for vulkan (#5257) * 1bdc3a4 Actually make MobileBERT run in the test. (#5264) * 2e05313 Add support for module almost_eq check for f16 type (#5261) COPYBARA_INTEGRATE_REVIEW=#5321 from NatashaKnk:main-to-google 6bd5658 PiperOrigin-RevId: 366926967

* 6bd5658 Merge google -> main (#5319) * 2e5257d Merge branch 'main' into google-to-main * 6936ee7 Patch VMLA performance by reserving vector size before pushing to it. (#5316) * f2f0041 NFC: Cleanup ConcretizeTileAmongstWorkgroupsPass. (#5297) * f96726a Add tests to run few other (smaller) models with Linalg on tensors path. (#5306) * fd64070 Revert "Add wasm-micro-runtime submodule and get building with CMake." (#5312) * ce0285f Continue pruning abseil usage: switch from absl::InlinedVector to std::vector... * 71e24b6 Removing hal.buffer.fill and hal.buffer.copy. (#5307) * 3c611d3 Add Mako benchmark config template file. (#5200) * 4d1a394 Fix RFFT bugs in VMLA. (#5308) * 0d55c95 Add configure_bazel.py step to TensorFlow getting started doc. * 1386d2c Switch simple_embedding_test to include drivers explicitly. (#5304) * 402550b Add StripAsserts pass and handle tf.Identity ops on tensor lists. (#5294) * fbdb4ef Add new metrics to MobileNetV2 benchmarks. (#5301) * 99c8eac Implementing Vulkan dispatch tracing. (#5287) * 2681dff Insert clones prior to mutation and not where it originates. (#5292) * aeafd9e Fix CUDA HAL bug and enable more execution tests (#5296) * 2801780 [CUDA Codegen] Enable tiling and vectorization for MatMulOp (#5293) * c61fefe Extend AffineMin canonicalization to support scf.parallel (#5289) * e0ee3f3 Add directory for microbenchmarking (#5260) * b8da32c Set wasm-export-name attributes on exported functions again. (#5286) * e2a2f81 Canonicalize affine min before applying tile-and-vecotrize passes (#5285) * 23861f7 [CUDA codegen] add vectorization infrastructure (#5278) * 6f443c4 Drop deps on Abseil's core_headers, synchronization, macros. (#5275) * e5b9e8a Actually run MobileNet with fake weights to check correctness (#5284) * e56db9a Remove dead code in LinalgToSPIRV (#5281) * 8863aa1 [NFC] Fix typos in variable names. (#5279) * 9cd93ba Turn vectorization on by default for linalg on tensors path (#5280) * 894dac6 Merge google -> main #5276 * b738162 Changing HAL dialect syntax to express all types. (#5239) * 1ba4e88 Merge branch 'main' into google-to-main * 531c73e Fix yml syntax (#5274) * 494fe32 Bumping the tracy version to 0.7.7 (WIP). (#5272) * 3616323 Disable Vulkan float16 tests on Pixel4 (#5273) * ade7ff1 Disable running BERT on Vulkan (see Issue #5268) (#5269) * 25ddc10 Add tracing to allocations made from VMA. (#5271) * df454f4 Changing iree_vm_list_resize to grow by 2x. (#5270) * bd9a113 Adding command buffer queue affinity. (#5265) * de834ae Make status matcher print the message when it fails. (#5266) * 10f5eaf Add f16 e2e tests for vulkan (#5257) * 1bdc3a4 Actually make MobileBERT run in the test. (#5264) * 2e05313 Add support for module almost_eq check for f16 type (#5261) PiperOrigin-RevId: 366926967

benvanik added performance ⚡ Performance/optimization related work across the compiler and runtime hal/vulkan Runtime Vulkan GPU HAL backend labels Apr 1, 2021

benvanik requested a review from ScottTodd April 1, 2021 18:53

google-cla bot added the cla: yes label Apr 1, 2021

benvanik force-pushed the benvanik-cmd-tracing branch from 5ed4a91 to d7f78c3 Compare April 1, 2021 20:07

benvanik force-pushed the benvanik-cmd-tracing branch from d7f78c3 to 4576ff2 Compare April 1, 2021 20:19

benvanik marked this pull request as ready for review April 1, 2021 20:22

ScottTodd approved these changes Apr 2, 2021

View reviewed changes

benvanik merged commit 99c8eac into main Apr 2, 2021

benvanik deleted the benvanik-cmd-tracing branch April 2, 2021 16:58

This was referenced Apr 5, 2021

Merge main -> google #5315

Closed

Merge main -> google #5321

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Vulkan dispatch tracing. #5287

Implementing Vulkan dispatch tracing. #5287

benvanik commented Apr 1, 2021 •

edited

Loading

ScottTodd Apr 1, 2021

ScottTodd Apr 1, 2021

benvanik Apr 2, 2021

Implementing Vulkan dispatch tracing. #5287

Implementing Vulkan dispatch tracing. #5287

Conversation

benvanik commented Apr 1, 2021 • edited Loading

ScottTodd Apr 1, 2021

Choose a reason for hiding this comment

ScottTodd Apr 1, 2021

Choose a reason for hiding this comment

benvanik Apr 2, 2021

Choose a reason for hiding this comment

benvanik commented Apr 1, 2021 •

edited

Loading