Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework sampling and colorized logs #140

Merged
merged 23 commits into from
Aug 31, 2022
Merged

Conversation

jrmadsen
Copy link
Collaborator

@jrmadsen jrmadsen commented Aug 30, 2022

Overview

This is a significant PR which has 3 very notable characteristics:

  1. Omnitrace colorizes most of it's logging
  2. Completely reworked the sampling
  • Samples now record the current instruction pointers instead of strings
    • This dramatically decreases the overhead of taking a sample
  • The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
  • When both OMNITRACE_SAMPLING_CPUTIME and OMNITRACE_SAMPLING_REALTIME are ON:
    • OMNITRACE_SAMPLING_CPUTIME_FREQ and OMNITRACE_SAMPLING_REALTIME_FREQ can be used to individually control the sampling frequency
  • OMNITRACE_SAMPLING_CPUTIME_DELAY and OMNITRACE_SAMPLING_REALTIME_DELAY can be used to individually control the delay time before starting
  • Now, omnitrace does not start a real-time sampler on the main thread unless OMNITRACE_SAMPLING_REALTIME is ON
    • In the future, an OMNITRACE_SAMPLING_TIDS (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
  1. Files produced by omnitrace exe -- available-instr.txt, instrumented-instr.txt, etc. -- now no longer has -instr suffix and are placed in instrumentation/ subfolder, i.e. available-instr.txt -> instrumentation/available.txt`
  • This helped de-clutter the output folder

Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.

Bug Fixes

There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made

Details

  • created thread_info struct for mapping different thread IDs
  • reorganized file structure significantly
  • added categories.hpp, concepts.hpp
  • moved around name trait definitions
  • moved all omnitrace components into omnitrace::component namespace
    • there was a lot of inconsistency b/t using tim::component in some places and omnitrace::component
    • added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
  • OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
  • roctracer and critical_trace use same thread pool
  • critical_trace functions do not lock anymore bc of thread-local TaskGroup
  • added component::local_category_region to support using component::category_region without explicitly passing in name
  • removed component::omnitrace (unused)
  • migrated KokkosP and OMPT to use component::local_category_region
    • removed component::user_region as a result
  • migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
    • removed component::functors as a result
  • migrated some ppdefs
  • api::omnitrace -> project::omnitrace
  • api::(...) -> category::(...)
  • improved recording the execution time of threads
    • migrated this functionality out of pthread_create_gotcha and into thread_info
  • moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
  • split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
  • sampling.cpp handles setup and post-processing that was formerly in backtrace
  • updated logging to use colors
  • OMNITRACE_COLORIZED_LOG config variable
  • updated docs on JSON output from timemory
  • instrumentation info in instrumentation subfolder
  • added testing for KokkosP entries
  • added testing for ompt entries
  • add_critical_trace function defined in critical_trace.hpp
  • disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
  • add comp::page_rss to main bundle
  • thread_data supports std::optional instead of std::unique_ptr
  • thread_data supports tim::identity to avoid unique_ptr or optional
  • tracing::record_thread_start_time()
  • tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
  • removed anonymous namespace from omnitrace::utility
  • sampling backtrace stores instruction pointers instead of strings
  • component::category_region updates
    • handle disabled thread state
    • handle finalized state
    • fewer debug messages
    • invoke thread_init()
    • invoke thread_init_sampling()
    • handle push/pop count based on category
    • push/pop count only modified when used
  • component::cpu_freq
  • components/ensure_storage.hpp
  • reworked the pthread_create replacement function
  • updated parallel-overhead example to report # of times locked
  • OMNITRACE_MAX_UNWIND_DEPTH build option
  • update timemory submodule

@jrmadsen jrmadsen added documentation Improvements or additions to documentation enhancement New feature or request bug fix Fixes a bug timemory Issue affects/involves timemory features/capabilities libomnitrace Involves omnitrace library libomnitrace-dl Involves omnitrace-dl library omnitrace-instrument Involves the omnitrace-instrument executable (binary instrumenter) examples Adds new example or modifies existing example cmake Modifies the CMake build system submodule Updates a git submodule configuration Changes/involves configuration options labels Aug 30, 2022
- created thread_info struct for mapping different thread IDs
- reorganized many files
  - moved api.hpp and api.cpp
  - updated CMake in libomnitrace
- added categories.hpp
- added concepts.hpp
- moved around name definitions
- moved all omnitrace components into omnitrace::component namespace
  - there was a lot of inconsistency b/t using tim::component in some places and omnitrace::component
  - added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added component::local_category_region to support using component::category_region without explicitly passing in name
- removed component::omnitrace
- removed component::user_region
- removed component::functors
- migrated Kokkos to use component::local_category_region
- migrated OMPT to use component::local_category_region
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
- migrated some ppdefs
- api::omnitrace -> project::omnitrace
- improved recording the execution time of threads
  - migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- OMNITRACE_COLORIZED_LOG config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
  - handle disabled thread state
  - handle finalized state
  - fewer debug messages
  - invoke thread_init()
  - invoke thread_init_sampling()
  - handle push/pop count based on category
  - push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
- record_thread_start_time overwrites value generated by sampling
- delay shutting down sampling until later on in finalize
- destroying this gotcha potentially deletes the function/context of thread
- enable configuring different frequencies and delays of realtime and cputime sampling
- default to original settings
- OMNITRACE_SAMPLING_CPUTIME_FREQ
- OMNITRACE_SAMPLING_CPUTIME_DELAY
- OMNITRACE_SAMPLING_REALTIME_FREQ
- OMNITRACE_SAMPLING_REALTIME_DELAY
@jrmadsen jrmadsen merged commit 808ea7d into ROCm:main Aug 31, 2022
@jrmadsen jrmadsen deleted the rework-sampling branch August 31, 2022 06:24
@jrmadsen jrmadsen mentioned this pull request Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixes a bug cmake Modifies the CMake build system configuration Changes/involves configuration options documentation Improvements or additions to documentation enhancement New feature or request examples Adds new example or modifies existing example libomnitrace Involves omnitrace library libomnitrace-dl Involves omnitrace-dl library omnitrace-instrument Involves the omnitrace-instrument executable (binary instrumenter) submodule Updates a git submodule timemory Issue affects/involves timemory features/capabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant