Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the main library stop routine for timemory #39

Merged
merged 7 commits into from
Jun 13, 2022

Conversation

jrmadsen
Copy link
Collaborator

@jrmadsen jrmadsen commented Jun 10, 2022

  • the main pop_timemory function was popping too many calls
    • this primarily affected recursive calls
  • fixed the reporting of the per-thread metrics
  • bumped version to 1.1.1

Before this bug fix, this is a sample output from instrumenting a fibonacci function (note the difference in the COUNT column):

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                            REAL-CLOCK TIMER (I.E. WALL-CLOCK TIMER)                                                                           |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                   LABEL                                    | COUNT  | DEPTH  |   METRIC   | UNITS  |   SUM    |   MEAN   |   MIN    |   MAX    |   VAR    | STDDEV   | % SELF |
|----------------------------------------------------------------------------|--------|--------|------------|--------|----------|----------|----------|----------|----------|----------|--------|
| |0>>> int main(int, char * *)                                              |      1 |      0 | wall_clock | sec    | 0.041326 | 0.041326 | 0.041326 | 0.041326 | 0.000000 | 0.000000 |   56.7 |
| |0>>> |_long int fibonacci(long int)                                       |  10946 |      1 | wall_clock | sec    | 0.017911 | 0.000002 | 0.000000 | 0.000052 | 0.000000 | 0.000003 |   40.6 |
| |0>>>   |_long int fibonacci(long int)                                     |   4181 |      2 | wall_clock | sec    | 0.010635 | 0.000003 | 0.000000 | 0.000050 | 0.000000 | 0.000003 |   37.4 |
| |0>>>     |_long int fibonacci(long int)                                   |   2584 |      3 | wall_clock | sec    | 0.006659 | 0.000003 | 0.000000 | 0.000047 | 0.000000 | 0.000003 |   36.7 |
| |0>>>       |_long int fibonacci(long int)                                 |   1597 |      4 | wall_clock | sec    | 0.004217 | 0.000003 | 0.000000 | 0.000044 | 0.000000 | 0.000003 |   36.3 |
| |0>>>         |_long int fibonacci(long int)                               |    987 |      5 | wall_clock | sec    | 0.002685 | 0.000003 | 0.000000 | 0.000042 | 0.000000 | 0.000003 |   39.1 |
| |0>>>           |_long int fibonacci(long int)                             |    610 |      6 | wall_clock | sec    | 0.001634 | 0.000003 | 0.000000 | 0.000040 | 0.000000 | 0.000003 |   39.2 |
| |0>>>             |_long int fibonacci(long int)                           |    377 |      7 | wall_clock | sec    | 0.000993 | 0.000003 | 0.000000 | 0.000039 | 0.000000 | 0.000003 |   40.0 |
| |0>>>               |_long int fibonacci(long int)                         |    233 |      8 | wall_clock | sec    | 0.000596 | 0.000003 | 0.000000 | 0.000037 | 0.000000 | 0.000004 |   37.4 |
| |0>>>                 |_long int fibonacci(long int)                       |    144 |      9 | wall_clock | sec    | 0.000373 | 0.000003 | 0.000000 | 0.000035 | 0.000000 | 0.000004 |   35.8 |
| |0>>>                   |_long int fibonacci(long int)                     |     89 |     10 | wall_clock | sec    | 0.000240 | 0.000003 | 0.000000 | 0.000034 | 0.000000 | 0.000004 |   34.0 |
| |0>>>                     |_long int fibonacci(long int)                   |     55 |     11 | wall_clock | sec    | 0.000158 | 0.000003 | 0.000000 | 0.000031 | 0.000000 | 0.000005 |   37.2 |
| |0>>>                       |_long int fibonacci(long int)                 |     34 |     12 | wall_clock | sec    | 0.000099 | 0.000003 | 0.000000 | 0.000028 | 0.000000 | 0.000005 |   33.6 |
| |0>>>                         |_long int fibonacci(long int)               |     21 |     13 | wall_clock | sec    | 0.000066 | 0.000003 | 0.000000 | 0.000025 | 0.000000 | 0.000005 |   29.7 |
| |0>>>                           |_long int fibonacci(long int)             |     13 |     14 | wall_clock | sec    | 0.000046 | 0.000004 | 0.000000 | 0.000024 | 0.000000 | 0.000006 |   28.8 |
| |0>>>                             |_long int fibonacci(long int)           |      8 |     15 | wall_clock | sec    | 0.000033 | 0.000004 | 0.000000 | 0.000022 | 0.000000 | 0.000007 |   24.5 |
| |0>>>                               |_long int fibonacci(long int)         |      5 |     16 | wall_clock | sec    | 0.000025 | 0.000005 | 0.000000 | 0.000020 | 0.000000 | 0.000009 |   18.4 |
| |0>>>                                 |_long int fibonacci(long int)       |      3 |     17 | wall_clock | sec    | 0.000020 | 0.000007 | 0.000000 | 0.000019 | 0.000000 | 0.000010 |   20.4 |
| |0>>>                                   |_long int fibonacci(long int)     |      2 |     18 | wall_clock | sec    | 0.000016 | 0.000008 | 0.000000 | 0.000016 | 0.000000 | 0.000011 |   12.6 |
| |0>>>                                     |_long int fibonacci(long int)   |      1 |     19 | wall_clock | sec    | 0.000014 | 0.000014 | 0.000014 | 0.000014 | 0.000000 | 0.000000 |   86.8 |
| |0>>>                                       |_long int fibonacci(long int) |      1 |     20 | wall_clock | sec    | 0.000002 | 0.000002 | 0.000002 | 0.000002 | 0.000000 | 0.000000 |  100.0 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

After this fix, here is the same instrumentation results:

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                            REAL-CLOCK TIMER (I.E. WALL-CLOCK TIMER)                                                                           |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                   LABEL                                    | COUNT  | DEPTH  |   METRIC   | UNITS  |   SUM    |   MEAN   |   MIN    |   MAX    |   VAR    | STDDEV   | % SELF |
|----------------------------------------------------------------------------|--------|--------|------------|--------|----------|----------|----------|----------|----------|----------|--------|
| |0>>> int main(int, char * *)                                              |      1 |      0 | wall_clock | sec    | 0.039666 | 0.039666 | 0.039666 | 0.039666 | 0.000000 | 0.000000 |    3.0 |
| |0>>> |_long int fibonacci(long int)                                       |      1 |      1 | wall_clock | sec    | 0.038463 | 0.038463 | 0.038463 | 0.038463 | 0.000000 | 0.000000 |    0.0 |
| |0>>>   |_long int fibonacci(long int)                                     |      2 |      2 | wall_clock | sec    | 0.038459 | 0.019230 | 0.014601 | 0.023859 | 0.000043 | 0.006547 |    0.0 |
| |0>>>     |_long int fibonacci(long int)                                   |      4 |      3 | wall_clock | sec    | 0.038453 | 0.009613 | 0.005597 | 0.014731 | 0.000014 | 0.003783 |    0.0 |
| |0>>>       |_long int fibonacci(long int)                                 |      8 |      4 | wall_clock | sec    | 0.038438 | 0.004805 | 0.002158 | 0.009226 | 0.000005 | 0.002195 |    0.1 |
| |0>>>         |_long int fibonacci(long int)                               |     16 |      5 | wall_clock | sec    | 0.038409 | 0.002401 | 0.000827 | 0.005765 | 0.000002 | 0.001244 |    0.1 |
| |0>>>           |_long int fibonacci(long int)                             |     32 |      6 | wall_clock | sec    | 0.038354 | 0.001199 | 0.000305 | 0.003591 | 0.000000 | 0.000694 |    0.3 |
| |0>>>             |_long int fibonacci(long int)                           |     64 |      7 | wall_clock | sec    | 0.038244 | 0.000598 | 0.000113 | 0.002211 | 0.000000 | 0.000381 |    0.5 |
| |0>>>               |_long int fibonacci(long int)                         |    128 |      8 | wall_clock | sec    | 0.038036 | 0.000297 | 0.000039 | 0.001375 | 0.000000 | 0.000208 |    1.1 |
| |0>>>                 |_long int fibonacci(long int)                       |    256 |      9 | wall_clock | sec    | 0.037617 | 0.000147 | 0.000013 | 0.000858 | 0.000000 | 0.000112 |    2.2 |
| |0>>>                   |_long int fibonacci(long int)                     |    512 |     10 | wall_clock | sec    | 0.036783 | 0.000072 | 0.000003 | 0.000537 | 0.000000 | 0.000060 |    4.5 |
| |0>>>                     |_long int fibonacci(long int)                   |   1024 |     11 | wall_clock | sec    | 0.035140 | 0.000034 | 0.000000 | 0.000334 | 0.000000 | 0.000032 |    9.6 |
| |0>>>                       |_long int fibonacci(long int)                 |   2026 |     12 | wall_clock | sec    | 0.031780 | 0.000016 | 0.000000 | 0.000211 | 0.000000 | 0.000017 |   19.0 |
| |0>>>                         |_long int fibonacci(long int)               |   3632 |     13 | wall_clock | sec    | 0.025744 | 0.000007 | 0.000000 | 0.000134 | 0.000000 | 0.000009 |   32.9 |
| |0>>>                           |_long int fibonacci(long int)             |   5020 |     14 | wall_clock | sec    | 0.017283 | 0.000003 | 0.000000 | 0.000090 | 0.000000 | 0.000005 |   48.7 |
| |0>>>                             |_long int fibonacci(long int)           |   4760 |     15 | wall_clock | sec    | 0.008870 | 0.000002 | 0.000000 | 0.000055 | 0.000000 | 0.000003 |   63.3 |
| |0>>>                               |_long int fibonacci(long int)         |   2942 |     16 | wall_clock | sec    | 0.003252 | 0.000001 | 0.000000 | 0.000038 | 0.000000 | 0.000002 |   75.2 |
| |0>>>                                 |_long int fibonacci(long int)       |   1152 |     17 | wall_clock | sec    | 0.000808 | 0.000001 | 0.000000 | 0.000027 | 0.000000 | 0.000002 |   83.2 |
| |0>>>                                   |_long int fibonacci(long int)     |    274 |     18 | wall_clock | sec    | 0.000136 | 0.000000 | 0.000000 | 0.000020 | 0.000000 | 0.000001 |   82.1 |
| |0>>>                                     |_long int fibonacci(long int)   |     36 |     19 | wall_clock | sec    | 0.000024 | 0.000001 | 0.000000 | 0.000016 | 0.000000 | 0.000003 |   91.6 |
| |0>>>                                       |_long int fibonacci(long int) |      2 |     20 | wall_clock | sec    | 0.000002 | 0.000001 | 0.000000 | 0.000002 | 0.000000 | 0.000001 |  100.0 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

After this PR, omnitrace now (once again) reports the per-thread metrics during finalization:

[omnitrace][86267][0] omnitrace : 5.190895 sec wall_clock,    2.748 mb peak_rss, 6.330000 sec cpu_clock,  121.9 % cpu_util [laps: 1]
[omnitrace][86267][0] user-api.inst/thread-0 : 5.078713 sec wall_clock, 4.722415 sec thread_cpu_clock,   93.0 % thread_cpu_util,    1.276 mb peak_rss [laps: 1]
[omnitrace][86267][0] user-api.inst/thread-1 : 0.322248 sec wall_clock, 0.322191 sec thread_cpu_clock,  100.0 % thread_cpu_util,    1.000 mb peak_rss [laps: 1]
[omnitrace][86267][0] user-api.inst/thread-2 : 0.323255 sec wall_clock, 0.323194 sec thread_cpu_clock,  100.0 % thread_cpu_util,    0.000 mb peak_rss [laps: 1]
[omnitrace][86267][0] user-api.inst/thread-3 : 0.323569 sec wall_clock, 0.323484 sec thread_cpu_clock,  100.0 % thread_cpu_util,    1.092 mb peak_rss [laps: 1]
[omnitrace][86267][0] user-api.inst/thread-4 : 0.324178 sec wall_clock, 0.324057 sec thread_cpu_clock,  100.0 % thread_cpu_util,    1.184 mb peak_rss [laps: 1]

- the main pop_timemory function was popping too many calls
- this primarily affected recursive calls
@jrmadsen jrmadsen added bug fix Fixes a bug timemory Issue affects/involves timemory features/capabilities libomnitrace Involves omnitrace library labels Jun 10, 2022
@jrmadsen jrmadsen added the testing Extends/improves/modifies testing label Jun 13, 2022
- Call-counts in timemory output examples in documentation were affected by the changes
- pthread_create_mutex starts/stops the per-thread data
- removed unintentional continue statement
@jrmadsen jrmadsen added the documentation Improvements or additions to documentation label Jun 13, 2022
@jrmadsen jrmadsen merged commit a142b20 into ROCm:main Jun 13, 2022
@jrmadsen jrmadsen deleted the timemory-stop-fix branch June 13, 2022 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixes a bug documentation Improvements or additions to documentation libomnitrace Involves omnitrace library testing Extends/improves/modifies testing timemory Issue affects/involves timemory features/capabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant