Add cooldown interval logic for the profiling functional #11465

Icemist · 2022-05-25T23:08:44Z

Motivation:

Sometimes running the script several times we get a strong scatter in the results.

from tvm.contrib.debugger import debug_runtime as graph_executor
...
m = graph_executor.create(graph, lib, ctx)
...
time_f = m.module.time_evaluator("run", dev, number=1000, repeat=1)
mean = time_f().mean
print(mean)

First try:

0.0846912 secs/iteration

Second try:

0.105601 secs/iteration

In general, the devices has an acceptable temperature range. If the system heats up to the upper limit for some time, it can force cooling by available ways. For example, by lowering the frequency.
In the following example the temperature reaches 90 degrees and after some time the frequency drops and after cooling is restored.

from tvm.contrib.debugger import debug_runtime as graph_executor
...
m = graph_executor.create(graph, lib, ctx)
...
time_f = m.module.time_evaluator("run", dev, number=1)
one = time_f().results[0]
nums = round(1/one)
time_f = m.module.time_evaluator("run", dev, number=1, repeat=nums*200)
t =  time_f()
mean = t.mean
print(mean)
0.105511 secs/iteration

print(t.results)
[0.083143681, ... , 0.147928064, ... , 0.147806464]

the results of device profiling

the plot drawn from t.results

In the proposed change, we use a preemptive stop of the calculations to keep the temperature of the device within comfortable limits. As a result, the temperature does not exceed 70 degrees.

from tvm.contrib.debugger import debug_runtime as graph_executor
...
m = graph_executor.create(graph, lib, ctx)
...
time_f = m.module.time_evaluator("run", dev, number=1, repeat=200, 
                                                       run_delay=1000, repeats_to_cooldown=12)
t =  time_f()
mean = t.mean
0.0832038 secs/iteration

print(t.results)
[0.083007232, ... , 0.083067136, ... , 0.083169024]

the results of device profiling

the plot drawn from t.results

from tvm.contrib.debugger import debug_runtime as graph_executor
...
m = graph_executor.create(graph, lib, ctx)
...
m.run(number=100, repeat=2)

Node Name Ops Time(us) Time(%) Shape Inputs Outputs Measurements

f1 f1 9451.75 9.822 (2, 2) 3 1 [9.456, 9.465, ..., 9.463]
f2 f2 5255.25 5.461 (2, 2) 4 1 [5.261, 5.308, ..., 5.302]
...
fn fn 5214.75 5.419 (2, 2) 4 1 [5.266, 5.192, ..., 5.199]

Changes:

Added cooldown_interval_ms and repeats_to_cooldown parameter to next functions:
python\tvm\runtime\vm.py:benchmark
python\tvm\runtime\module.py:time_evaluator
python\tvm\contrib\graph_executor.py:benchmark
python\tvm\contrib\debugger\debug_executor.py:run
python\tvm\contrib\debugger\debug_executor.py:run_individual
python\tvm\contrib\debugger\debug_executor.py:run_individual_node
The parameter is used to stop measuring/cooling of the device.
For python\tvm\contrib\debugger\debug_executor.py:run number, repeat, min_repeat_ms, repeats_to_cooldown parameters were added.
Fixed RunIndividual logging.
Removed USE_PROFILER that is no longer in use.

Icemist · 2022-06-09T00:14:12Z

@areusch @tmoreau89 @AndrewZhaoLuo

areusch · 2022-06-09T20:02:11Z

@Icemist i like the direction here--but, i am wondering if it's possible to put some teeth behind this by allowing an application-specific way to verify the cooldown actually happened. for example, perhaps one could pass a PackedFunc that knows how to check the die temperature and continue cooldown if it is still too hot. what do you think of this? otherwise, it strikes me as a feature that would be hard to use in practice if you weren't already watching this separately.

Icemist · 2022-06-09T20:35:06Z

@Icemist i like the direction here--but, i am wondering if it's possible to put some teeth behind this by allowing an application-specific way to verify the cooldown actually happened. for example, perhaps one could pass a PackedFunc that knows how to check the die temperature and continue cooldown if it is still too hot. what do you think of this? otherwise, it strikes me as a feature that would be hard to use in practice if you weren't already watching this separately.

This cooldown_interval logic already used for RPCRunner, LocalRunner and etc.
The other part of the patch only allows you to verify the quality of the data.

In your suggestion, in theory, you could try to use a ready-made f_preproc. Do 1 repeat, call f_preproc with some code to check and cool the temperature, do the next repeat.

for(i : repeat)
 call(f_preproc) // sleep here until the temperature is right  
 timer.start()
 call(pf)
 timer.end()

It seems to me that both options have a place.

AndrewZhaoLuo · 2022-06-10T00:10:56Z

cc @tkonolige

tkonolige

@Icemist thank you for the PR. Having a cooldown interval during timing is very useful.

However, you've moved the timer in time_evaluator into the inner loop. The timer is specifically in the outer loop so that we can time functions with a very small runtime or use less precise timers. I'll need you to revert this change before we can accept this PR.

tkonolige · 2022-06-10T16:45:35Z

python/tvm/contrib/debugger/debug_result.py

+        starting_times[1:] = np.cumsum(
+            [
+                np.mean([np.mean(repeat_data) for repeat_data in node_data])
+                for node_data in self._time_list
+            ]
+        )


This doesn't seem right. The variable is called starting_times but its values are the mean of mean time.

Before my changes, it was an average of 1 repeat. Here. I returned not only the first repeat of the node, but all of them for each node, and also in the form of arrays as raw measurements.
Now I just return all repeats for each node, not just the first. So I take the average here. Below where the duration is calculated, I also calculate the average.

tkonolige · 2022-06-10T16:46:19Z

python/tvm/contrib/debugger/debug_result.py

@@ -205,12 +210,18 @@ def _dump_graph_json(self, graph):

    def get_debug_result(self, sort_by_time=True):
        """Return the debugger result"""
-        header = ["Node Name", "Ops", "Time(us)", "Time(%)", "Shape", "Inputs", "Outputs"]
-        lines = ["---------", "---", "--------", "-------", "-----", "------", "-------"]
+        header = ["Node Name", "Ops", "Time(us)", "Time(%)", "Shape", "Inputs", "Outputs", "Times"]


We have Time(us) and Times here. What is the difference?

I added a return of all repeats, not just the first. Renamed it "Measurements(us)".

src/runtime/crt/common/crt_runtime_api.c

src/runtime/graph_executor/debug/graph_executor_debug.cc

src/runtime/profiling.cc

tkonolige

Changing my review to "Comment" so it doesn't block the PR while I am out of town for the next couple of weeks. Moving the timers into the inner loop still needs to be reverted though.

out of town

areusch · 2022-06-16T23:29:59Z

@Icemist it'd be great if you could address @tkonolige 's comments.

In your suggestion, in theory, you could try to use a ready-made f_preproc. Do 1 repeat, call f_preproc with some code to check and cool the temperature, do the next repeat.

just wondering, would it be possible to achieve same effect as this PR with that strategy?

Icemist · 2022-06-20T02:18:08Z

@Icemist it'd be great if you could address @tkonolige 's comments.

In your suggestion, in theory, you could try to use a ready-made f_preproc. Do 1 repeat, call f_preproc with some code to check and cool the temperature, do the next repeat.

just wondering, would it be possible to achieve same effect as this PR with that strategy?

It seems to me with that strategy it is more difficult to get a quick and generalized solution.
There may be a need for some general mechanics of adding such functions for different devices by developers familiar with these systems.

Icemist · 2022-06-22T11:04:49Z

@areusch and @AndrewZhaoLuo, what do you think? I addressed all the remarks. I also followed up on @tkonolige's request.

tkonolige

Looks good @Icemist, just a couple questions.

include/tvm/runtime/profiling.h

src/runtime/crt/common/crt_runtime_api.c

tkonolige · 2022-06-22T16:59:11Z

src/runtime/graph_executor/debug/graph_executor_debug.cc

+   * \param repeats_to_cooldown The number of repeats before the
+   *        cooldown is activated.
+   * \return Comma separated string containing the elapsed time per op for
+   *         the last iteration only, because returning a long string over rpc can be expensive.


It seems like you've changed this to return all elapsed time per ops instead of just the last one? The comment seems to indicate this may be problematic?

I wonder what caused it. I didn't have any problems even when there were tens of thousands of numbers here.
I would guess that the right thing to do here would be to pack the data as it is done in time_evaluator. And then extract the data with struct.unpack.
I would like to know the reason for the comment:
// Have problems returning FloatImm so serialize to string results as hack.
@AndrewZhaoLuo can you help with that?

In my previous version of the patch, I did dimanic array packing, and this was not a problem for time_evaluator.

I forget the exact reason why FloatImm can't be serialized but what you are doing is the string serialization hack which is fine.

"returning a long string over rpc can be expensive." i don't have context on but if you aren't having any problems it's probably fine?

src/runtime/graph_executor/debug/graph_executor_debug.cc

tkonolige · 2022-06-23T16:32:44Z

src/runtime/crt/common/crt_runtime_api.c

    // do-while structure ensures we run even when `min_repeat_ms` isn't set (i.e., is 0).
    do {
+      if (curr_res_seconds > 0.0) {
+        double a = (min_repeat_seconds / (curr_res_seconds / g_time_evaluator_state.number) + 1);
+        double b = g_time_evaluator_state.number * 1.618;  // 1.618 is chosen by random


I know the previous code said 1.618 was random, but its actually not :). Its the golden ratio!

AndrewZhaoLuo · 2022-06-23T18:36:49Z

src/runtime/graph_executor/debug/graph_executor_debug.cc

-        time = 0;
+      for (const auto& repeat_data : time_sec_per_op[index]) {
+        // To have good behavior when calculating total time, etc.
+        os << (std::isnan(repeat_data) ? std::to_string(0) : std::to_string(repeat_data)) << ",";


Probably want full precision so want something like

// use maximum precision available and use fixed representation os << std::fixed; os.precision(std::numeric_limits<double>::max_digits10);

and feed every double manually into the stream.

AndrewZhaoLuo · 2022-06-23T18:40:33Z

src/runtime/graph_executor/debug/graph_executor_debug.cc

+   * \param repeats_to_cooldown The number of repeats before the
+   *        cooldown is activated.
+   * \return Comma separated string containing the elapsed time per op for
+   *         the last iteration only, because returning a long string over rpc can be expensive.


I forget the exact reason why FloatImm can't be serialized but what you are doing is the string serialization hack which is fine.

"returning a long string over rpc can be expensive." i don't have context on but if you aren't having any problems it's probably fine?

AndrewZhaoLuo · 2022-06-23T18:40:57Z

Leave the rest up to @tkonolige

Icemist · 2022-06-27T19:11:04Z

@tkonolige I added a commit to this PR 3e3b2cd where I change the way the RunIndividual and RunIndividualNode functions get results. I think this should solve the issue above.

src/runtime/graph_executor/debug/graph_executor_debug.cc

tkonolige

Looks great! Thanks @Icemist!

* Add cooldown interval logic for the profiling functional. * Remove string serialize hack from RunIndividual functions * Update src/runtime/graph_executor/debug/graph_executor_debug.cc Co-authored-by: Tristan Konolige <[email protected]>

Icemist force-pushed the avoronov/detailed_profiling branch 11 times, most recently from 7439e61 to 20e6d32 Compare June 2, 2022 10:25

Icemist changed the title ~~Add run_delay to profiling functional and provide detailed data about runs~~ Add cooldown_interval_ms and internal measurements for profiling Jun 2, 2022

Icemist force-pushed the avoronov/detailed_profiling branch from 20e6d32 to 7dc790c Compare June 2, 2022 11:00

Icemist marked this pull request as ready for review June 2, 2022 11:46

Icemist force-pushed the avoronov/detailed_profiling branch from 7dc790c to 659dead Compare June 2, 2022 15:45

ShouyangDong mentioned this pull request Jun 7, 2022

bangpy: time_evaluator可以忽略个别不正常的数据吗？ Cambricon/mlu-ops#20

Closed

Icemist force-pushed the avoronov/detailed_profiling branch 2 times, most recently from 8e3e7fe to 1d96f78 Compare June 9, 2022 00:05

tkonolige previously requested changes Jun 10, 2022

View reviewed changes

tkonolige reviewed Jun 10, 2022

View reviewed changes

Icemist force-pushed the avoronov/detailed_profiling branch 4 times, most recently from 2cb6fc6 to b140a6f Compare June 17, 2022 11:40

Icemist force-pushed the avoronov/detailed_profiling branch from b140a6f to 58c20ea Compare June 20, 2022 02:13

Icemist changed the title ~~Add cooldown_interval_ms and internal measurements for profiling~~ Add cooldown interval logic for the profiling functional Jun 20, 2022

Icemist force-pushed the avoronov/detailed_profiling branch 2 times, most recently from f16f5b4 to eaeccc3 Compare June 20, 2022 10:58

Icemist requested a review from tkonolige June 22, 2022 11:00

tkonolige requested changes Jun 22, 2022

View reviewed changes

Icemist force-pushed the avoronov/detailed_profiling branch from eaeccc3 to 66f08ba Compare June 23, 2022 01:20

Add cooldown interval logic for the profiling functional.

b19bb4d

Icemist force-pushed the avoronov/detailed_profiling branch from 66f08ba to b19bb4d Compare June 23, 2022 10:54

tkonolige reviewed Jun 23, 2022

View reviewed changes

AndrewZhaoLuo reviewed Jun 23, 2022

View reviewed changes

Icemist force-pushed the avoronov/detailed_profiling branch 2 times, most recently from 9355cfa to 3004b89 Compare June 27, 2022 19:08

Remove string serialize hack from RunIndividual functions

4a44bd5

Icemist force-pushed the avoronov/detailed_profiling branch from 3004b89 to 4a44bd5 Compare June 27, 2022 22:13

tkonolige reviewed Jun 27, 2022

View reviewed changes

src/runtime/graph_executor/debug/graph_executor_debug.cc Outdated Show resolved Hide resolved

Update src/runtime/graph_executor/debug/graph_executor_debug.cc

861acfd

tkonolige approved these changes Jun 28, 2022

View reviewed changes

tkonolige merged commit 6c8a353 into apache:main Jun 28, 2022

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cooldown interval logic for the profiling functional #11465

Add cooldown interval logic for the profiling functional #11465

Icemist commented May 25, 2022 •

edited

Loading

Icemist commented Jun 9, 2022

areusch commented Jun 9, 2022

Icemist commented Jun 9, 2022

AndrewZhaoLuo commented Jun 10, 2022

tkonolige left a comment

tkonolige Jun 10, 2022

Icemist Jun 17, 2022

tkonolige Jun 10, 2022

Icemist Jun 17, 2022

tkonolige left a comment

areusch commented Jun 16, 2022

Icemist commented Jun 20, 2022

Icemist commented Jun 22, 2022

tkonolige left a comment

tkonolige Jun 22, 2022

Icemist Jun 23, 2022

AndrewZhaoLuo Jun 23, 2022

tkonolige Jun 23, 2022

AndrewZhaoLuo Jun 23, 2022 •

edited

Loading

AndrewZhaoLuo Jun 23, 2022

AndrewZhaoLuo commented Jun 23, 2022

Icemist commented Jun 27, 2022

tkonolige left a comment

Add cooldown interval logic for the profiling functional #11465

Add cooldown interval logic for the profiling functional #11465

Conversation

Icemist commented May 25, 2022 • edited Loading

Icemist commented Jun 9, 2022

areusch commented Jun 9, 2022

Icemist commented Jun 9, 2022

AndrewZhaoLuo commented Jun 10, 2022

tkonolige left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkonolige left a comment

Choose a reason for hiding this comment

areusch commented Jun 16, 2022

Icemist commented Jun 20, 2022

Icemist commented Jun 22, 2022

tkonolige left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewZhaoLuo Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewZhaoLuo commented Jun 23, 2022

Icemist commented Jun 27, 2022

tkonolige left a comment

Choose a reason for hiding this comment

Icemist commented May 25, 2022 •

edited

Loading

AndrewZhaoLuo Jun 23, 2022 •

edited

Loading