-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cooldown interval logic for the profiling functional #11465
Conversation
7439e61
to
20e6d32
Compare
20e6d32
to
7dc790c
Compare
7dc790c
to
659dead
Compare
8e3e7fe
to
1d96f78
Compare
@Icemist i like the direction here--but, i am wondering if it's possible to put some teeth behind this by allowing an application-specific way to verify the cooldown actually happened. for example, perhaps one could pass a PackedFunc that knows how to check the die temperature and continue cooldown if it is still too hot. what do you think of this? otherwise, it strikes me as a feature that would be hard to use in practice if you weren't already watching this separately. |
This cooldown_interval logic already used for RPCRunner, LocalRunner and etc. In your suggestion, in theory, you could try to use a ready-made f_preproc. Do 1 repeat, call f_preproc with some code to check and cool the temperature, do the next repeat.
It seems to me that both options have a place. |
cc @tkonolige |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Icemist thank you for the PR. Having a cooldown interval during timing is very useful.
However, you've moved the timer in time_evaluator
into the inner loop. The timer is specifically in the outer loop so that we can time functions with a very small runtime or use less precise timers. I'll need you to revert this change before we can accept this PR.
starting_times[1:] = np.cumsum( | ||
[ | ||
np.mean([np.mean(repeat_data) for repeat_data in node_data]) | ||
for node_data in self._time_list | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. The variable is called starting_times
but its values are the mean of mean time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before my changes, it was an average of 1 repeat. Here. I returned not only the first repeat of the node, but all of them for each node, and also in the form of arrays as raw measurements.
Now I just return all repeats for each node, not just the first. So I take the average here. Below where the duration is calculated, I also calculate the average.
@@ -205,12 +210,18 @@ def _dump_graph_json(self, graph): | |||
|
|||
def get_debug_result(self, sort_by_time=True): | |||
"""Return the debugger result""" | |||
header = ["Node Name", "Ops", "Time(us)", "Time(%)", "Shape", "Inputs", "Outputs"] | |||
lines = ["---------", "---", "--------", "-------", "-----", "------", "-------"] | |||
header = ["Node Name", "Ops", "Time(us)", "Time(%)", "Shape", "Inputs", "Outputs", "Times"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have Time(us)
and Times
here. What is the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a return of all repeats, not just the first. Renamed it "Measurements(us)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing my review to "Comment" so it doesn't block the PR while I am out of town for the next couple of weeks. Moving the timers into the inner loop still needs to be reverted though.
@Icemist it'd be great if you could address @tkonolige 's comments.
just wondering, would it be possible to achieve same effect as this PR with that strategy? |
2cb6fc6
to
b140a6f
Compare
b140a6f
to
58c20ea
Compare
It seems to me with that strategy it is more difficult to get a quick and generalized solution. |
f16f5b4
to
eaeccc3
Compare
@areusch and @AndrewZhaoLuo, what do you think? I addressed all the remarks. I also followed up on @tkonolige's request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @Icemist, just a couple questions.
* \param repeats_to_cooldown The number of repeats before the | ||
* cooldown is activated. | ||
* \return Comma separated string containing the elapsed time per op for | ||
* the last iteration only, because returning a long string over rpc can be expensive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you've changed this to return all elapsed time per ops instead of just the last one? The comment seems to indicate this may be problematic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what caused it. I didn't have any problems even when there were tens of thousands of numbers here.
I would guess that the right thing to do here would be to pack the data as it is done in time_evaluator
. And then extract the data with struct.unpack
.
I would like to know the reason for the comment:
// Have problems returning FloatImm so serialize to string results as hack.
@AndrewZhaoLuo can you help with that?
In my previous version of the patch, I did dimanic array packing, and this was not a problem for time_evaluator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget the exact reason why FloatImm can't be serialized but what you are doing is the string serialization hack which is fine.
"returning a long string over rpc can be expensive." i don't have context on but if you aren't having any problems it's probably fine?
eaeccc3
to
66f08ba
Compare
66f08ba
to
b19bb4d
Compare
// do-while structure ensures we run even when `min_repeat_ms` isn't set (i.e., is 0). | ||
do { | ||
if (curr_res_seconds > 0.0) { | ||
double a = (min_repeat_seconds / (curr_res_seconds / g_time_evaluator_state.number) + 1); | ||
double b = g_time_evaluator_state.number * 1.618; // 1.618 is chosen by random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the previous code said 1.618
was random, but its actually not :). Its the golden ratio!
time = 0; | ||
for (const auto& repeat_data : time_sec_per_op[index]) { | ||
// To have good behavior when calculating total time, etc. | ||
os << (std::isnan(repeat_data) ? std::to_string(0) : std::to_string(repeat_data)) << ","; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want full precision so want something like
// use maximum precision available and use fixed representation
os << std::fixed;
os.precision(std::numeric_limits<double>::max_digits10);
and feed every double manually into the stream.
* \param repeats_to_cooldown The number of repeats before the | ||
* cooldown is activated. | ||
* \return Comma separated string containing the elapsed time per op for | ||
* the last iteration only, because returning a long string over rpc can be expensive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget the exact reason why FloatImm can't be serialized but what you are doing is the string serialization hack which is fine.
"returning a long string over rpc can be expensive." i don't have context on but if you aren't having any problems it's probably fine?
Leave the rest up to @tkonolige |
9355cfa
to
3004b89
Compare
@tkonolige I added a commit to this PR 3e3b2cd where I change the way the RunIndividual and RunIndividualNode functions get results. I think this should solve the issue above. |
3004b89
to
4a44bd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks @Icemist!
* Add cooldown interval logic for the profiling functional. * Remove string serialize hack from RunIndividual functions * Update src/runtime/graph_executor/debug/graph_executor_debug.cc Co-authored-by: Tristan Konolige <[email protected]>
* Add cooldown interval logic for the profiling functional. * Remove string serialize hack from RunIndividual functions * Update src/runtime/graph_executor/debug/graph_executor_debug.cc Co-authored-by: Tristan Konolige <[email protected]>
* Add cooldown interval logic for the profiling functional. * Remove string serialize hack from RunIndividual functions * Update src/runtime/graph_executor/debug/graph_executor_debug.cc Co-authored-by: Tristan Konolige <[email protected]>
Motivation:
Sometimes running the script several times we get a strong scatter in the results.
First try:
Second try:
In general, the devices has an acceptable temperature range. If the system heats up to the upper limit for some time, it can force cooling by available ways. For example, by lowering the frequency.
In the following example the temperature reaches 90 degrees and after some time the frequency drops and after cooling is restored.
the results of device profiling
the plot drawn from t.results
In the proposed change, we use a preemptive stop of the calculations to keep the temperature of the device within comfortable limits. As a result, the temperature does not exceed 70 degrees.
the results of device profiling
the plot drawn from t.results
Changes:
cooldown_interval_ms
andrepeats_to_cooldown
parameter to next functions:python\tvm\runtime\vm.py:benchmark
python\tvm\runtime\module.py:time_evaluator
python\tvm\contrib\graph_executor.py:benchmark
python\tvm\contrib\debugger\debug_executor.py:run
python\tvm\contrib\debugger\debug_executor.py:run_individual
python\tvm\contrib\debugger\debug_executor.py:run_individual_node
The parameter is used to stop measuring/cooling of the device.
number
,repeat
,min_repeat_ms
,repeats_to_cooldown
parameters were added.