Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU HW Counters via rocprofiler #84

Merged
merged 27 commits into from
Jul 18, 2022
Merged

Conversation

jrmadsen
Copy link
Collaborator

@jrmadsen jrmadsen commented Jul 12, 2022

  • Initial support for GPU hardware counters via rocprofiler API
    • timemory outputs are per-device and thread field correlates to the queue id, e.g. rocprof-SQ_WAVES-device-0.txt
  • New configuration variables
    • OMNITRACE_ROCM_EVENTS
      • List of ROCm HW counters (similar to OMNITRACE_PAPI_EVENTS)
      • Supports adding :device=N to entries for collection on specific devices
      • See omnitrace-avail -H --categories GPU -d
    • OMNITRACE_USE_ROCPROFILER
      • Enable collection GPU HW counters
      • requires OMNITRACE_ROCM_EVENTS
  • relocated library/components/rocprofiler.* to library/rocprofiler.*
  • added perfetto output of rocprofiler
  • added timemory output of rocprofiler
  • renamed omni.roctracer thread to roctracer.hip
  • added roctracer.hsa thread name
  • updated timemory submodule to support std::variant
  • updated timemory submodule to support = in config value
  • updated timemory submodule to support standalone storage
  • updated timemory submodule to support new hw counter apis
  • updated timemory submodule to prevent label/description caching in data_tracker

Relevant Environment Variables

  • OMNITRACE_ROCPROFILER_LIBRARY
    • default: libomnitrace.so
  • ROCM_PATH
    • default: /opt/rocm
  • ROCP_METRICS:
    • default: <ROCM_PATH>/rocprofiler/lib/metrics.xml

Config file example

OMNITRACE_ROCM_EVENTS = GRBM_COUNT GPUBusy SQ_WAVES SQ_INSTS_VALU VALUInsts TCC_HIT_sum TA_TA_BUSY[0]:device=0 TA_TA_BUSY[11]:device=0

omnitrace-avail examples

$ ./omnitrace-avail -H --categories GPU -db -r "GRBM_COUNT|GPUBusy|SQ_WAVES|SQ_INSTS_VALU|VALUInsts|TCC_HIT_sum|TA_TA_BUSY.(0|11)."
|----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|               HARDWARE COUNTER               |                                                                                                           DESCRIPTION                                                                                                           |
|----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                     GPU                      |                                                                                                                                                                                                                                 |
|----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GRBM_COUNT:device=0                          | Tie High - Count Number of Clocks                                                                                                                                                                                               |
| SQ_WAVES:device=0                            | Count number of waves sent to SQs. (per-simd, emulated, global)                                                                                                                                                                 |
| SQ_INSTS_VALU:device=0                       | Number of VALU instructions issued. (per-simd, emulated)                                                                                                                                                                        |
| TA_TA_BUSY[0]:device=0                       | TA block is busy. Perf_Windowing not supported for this counter.                                                                                                                                                                |
| TA_TA_BUSY[11]:device=0                      | TA block is busy. Perf_Windowing not supported for this counter.                                                                                                                                                                |
| TCC_HIT_sum:device=0                         | Number of cache hits. Sum over TCC instances.                                                                                                                                                                                   |
| GPUBusy:device=0                             | The percentage of time GPU was busy.                                                                                                                                                                                            |
| VALUInsts:device=0                           | The average number of vector ALU instructions executed per work-item (affected by flow control).                                                                                                                                |
|----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

@jrmadsen jrmadsen added enhancement New feature or request perfetto Issue affects/involves perfetto features/capabilities timemory Issue affects/involves timemory features/capabilities libomnitrace Involves omnitrace library omnitrace-avail Involves the omnitrace-avail executable (info tool) roctracer GPU kernel tracing cmake Modifies the CMake build system submodule Updates a git submodule configuration Changes/involves configuration options rocprofiler GPU kernel HW counters labels Jul 12, 2022
@jrmadsen jrmadsen closed this Jul 12, 2022
@jrmadsen jrmadsen reopened this Jul 12, 2022
jrmadsen and others added 18 commits July 12, 2022 21:29
- /opt/rocm/{rocprofiler,roctracer} path is deprecated so tweak search procedure
- rocm_metrics()
- minor cleanup
- hw_counter categories
- init rocm
…acer.*

- relocated library/components/rocprofiler.* to library/rocprofiler.*
- cleaned up rocprofiler.hpp
- added perfetto output of rocprofiler
- added timemory output of rocprofiler
- renamed omni.roctracer thread to roctracer.hip
- added roctracer.hsa thread name
- updated timemory submodule to support std::variant
- updated timemory submodule to support = in config value
- updated timemory submodule to support standalone storage
- updated timemory submodule to support new hw counter apis
- updated timemory submodule to prevent label/description caching in data_tracker
- Add -c command-line option for --categories
- support verbosity
- throw exceptions to avoid aborting on HSA_STATUS_ERROR_NOT_INITIALIZED when advantageous
- removed duplicate specialization of is_available for component::rocprofiler
- std::stringstream from initializer list would use explicit constructor
@jrmadsen jrmadsen changed the title [WIP] GPU HW Counters via rocprofiler GPU HW Counters via rocprofiler Jul 17, 2022
- added using statements from timemory
- tweaked the main and thread bundle names
- fixed timemory header includes
@jrmadsen jrmadsen merged commit 4208b56 into ROCm:main Jul 18, 2022
@jrmadsen jrmadsen deleted the rocprofiler-support branch July 18, 2022 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmake Modifies the CMake build system configuration Changes/involves configuration options enhancement New feature or request libomnitrace Involves omnitrace library omnitrace-avail Involves the omnitrace-avail executable (info tool) perfetto Issue affects/involves perfetto features/capabilities rocprofiler GPU kernel HW counters roctracer GPU kernel tracing submodule Updates a git submodule timemory Issue affects/involves timemory features/capabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant