-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flame Graph segment width seems to be weighted by sample count rather than time #3558
Comments
I suppose my first question should be: is this considered a bug? Perhaps there are reasons to weight by sample rather than by time? I'm struggling to think of reasons, but I don't want to rule it out. It seems a bit confusing to me to have (in the first screenshot) it say |
You're touching the (current) limitations of importing data from non-gecko sources, and of our history of being a sampling profiler dedicated to Gecko. Historically the data comes from a sampling profiler. We were approximating the time spent with the simple formula But this doesn't work for the stack graph, where we need to fill the holes. So we had to come up with an algorithm to determine if we stay in a function over several samples. As a result we had a duration. But that duration was different than the simple formula obviously, especially when there were holes. So we decided to use the same algorithm as the stack graph to compute this "tracing time", and use that in the tooltip, in addition to showing the values of "sample count" because this is what we saw really. I believe we could use the "tracing time" as source for the flame graph, but would that be closer to the truth in case of sampling holes? I'm not so sure, we actually don't know what happens in a sampling hole. Now fast forward to your issue. One problem is we never really took linux perf as a first-class data provider, so we made it work somewhat, with the assumption than our few users would know the limitations. I believe there's another way to fix that. I don't know really what the source data looks like, you'll know better. If we do have durations for the samples, we could use the concept of "weight" that we don't use enough but that our various algorithms take into account: Lines 97 to 142 in cc73c52
With the caveat that it's not currently part of the sample data in the gecko format, this is only part of the sample data in the processed format. So either we could augment the gecko format to support it, or we can make the linux perf importer output a processed format instead of a gecko format. My prefered option would be the latter, but this is more work. What do you think? |
Thank you for this detailed explanation. This is a tricky topic, and I'm
still thinking about it. This is my first draft of trying to clarify my
thoughts...
I would say this would happen more for on-cpu profiling
Yes, good point, this will also affect on-CPU profiling where all the time
spent off-CPU would be 'filled-in'.
I hadn't considered the 'filling in' algorithm -- it may mitigate most of
the impacts for off-CPU profiling for my use case by 'filling in' the time
spent off-CPU.
I believe we could use the "tracing time" as source for the flame graph,
but would that be closer to the truth in case of sampling holes? I'm not so
sure, we actually don't know what happens in a sampling hole.
I'd like to understand more about this concern. Are you worried about big
unintentional holes (i.e. longer than the sampling frequency)
unintentionally caused by data collection problems?
When you have a long hole, I think it's usually due to thread blocking, and
off-CPU profiling tries to get better data for the cause of that blocked
time by taking a stack trace when the thread is descheduled (using the
sched:sched_switch event, I think). So you can see that the 'hole' is
caused by (e.g.) a futex wait, or an IO write.
Many linux perf files are recorded using 'on-CPU' tracing, and don't have
this off-cpu sched_switch stacktrace, we should be mindful of those.
If we do have durations for the samples.
We do :-) In the raw perf script output, we have cpu-clock data
(nanoseconds) associated with each sample.
I'll need to think some more about whether the 'filling in' algorithm you
have is good enough -- I think you're using the timestamp, which I suppose
is the same information as we get from the cpu-clock.
My prefered option would be the latter, but this is more work.
I'm OK to do more work to get this right. I wasn't sure if the 'processed
format' was a stable format for storage or data interchange -- is it? If so
then using the processed format sounds good, the processed format has a lot
of advantages.
I'll think about this some more. I'd want to make sure this has reasonable
output for both on-CPU and off-CPU profiles.
…On Fri, 17 Sept 2021 at 23:22, Julien Wajsberg ***@***.***> wrote:
This comes up particularly with off-cpu profiling
I have a limited understanding of this, but I would say this would happen
more for on-cpu profiling.
You're touching the (current) limitations of importing data from non-gecko
sources, and of our history of being a sampling profiler dedicated to Gecko.
Historically the data comes from a sampling profiler. We were
approximating the time spent with the simple formula time * sampleCount.
And this is basically what you see in the flame graph currently.
But this doesn't work for the stack graph, where we need to fill the
holes. So we had to come up with an algorithm to determine if we stay in a
function over several samples. As a result we had a duration. But that
duration was different than the simple formula obviously, especially when
there were holes.
So we decided to use the same algorithm as the stack graph to compute this
"tracing time", and use that in the tooltip, in addition to showing the
values of "sample count" because this is what we saw really.
I believe we could use the "tracing time" as source for the flame graph,
but would that be closer to the truth in case of sampling holes? I'm not so
sure, we actually don't know what happens in a sampling hole.
Now fast forward to your issue. One problem is we never really took linux
perf as a first-class data provider, so we made it work somewhat, with the
assumption than our few users would know the limitations.
I believe there's another way to fix that. I don't know really what the
source data looks like, you'll know better. If we do have durations for the
samples, we could use the concept of "weight" that we don't use enough but
that our various algorithms take into account:
https://github.com/firefox-devtools/profiler/blob/cc73c52b3c2f803183b14e401e56ea76283dd342/src/types/profile.js#L97-L142
With the caveat that it's not currently part of the sample data in the
gecko format, this is only part of the sample data in the processed format.
So either we could augment the gecko format to support it, or we can make
the linux perf importer output a processed format instead of a gecko
format. My prefered option would be the latter, but this is more work.
What do you think?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3558 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAZYOLPVL3UL6CJ6OI6YMTUCM6KLANCNFSM5EGDINJQ>
.
|
I would recommend using the weight parameter here. If I were to approach this I would add it to the Gecko Profiler format, and then ensure that timing information in tooltips is using the correct name. I'm not sure its worth converting to the processed format just yet, but you can also add things like memory allocation information to the processed format. SUM(traced timing)This is the traced time. This is useful when we don't know the length of the sample, but is somewhat misleading due to gaps in sampling being from things like locks. SUM(sample weight)I would really like this to be fully supported and implemented well in the UI. It's really nice for imported profiles. SUM(sample count * sampling interval)I don't know that we should show this value, as it's so misleading. As someone who works on the profiler, I kind of like the value as it gives me some information I find useful, but as a profiler user I think this would be misleading. |
fyi, i created a very rough pr to fix this. at least, it should be working #4320 is anyone still interested? if so, i may prioritize finishing up the pr. |
Yes, we're still interested! Thanks a lot for looking at this :-) |
I'm still interested too! I have been working around this issue by looking
at weighted profiles with pprof, but would far prefer using the time-series
view available in Firefox Profiler.
…On Thu, 17 Nov 2022 at 04:01, Julien Wajsberg ***@***.***> wrote:
Yes, we're still interested! Thanks a lot for looking at this :-)
—
Reply to this email directly, view it on GitHub
<#3558 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAZYONIWB4JCORI5MIIWITWIUHQHANCNFSM5EGDINJQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
#لا نهتم بقيود |
This comes up particularly with off-cpu profiling, where one stack sample (when a thread blocks on IO or mutex) might represent a large wall-clock duration.
Compare this screenshot, where a segment has 237 samples and takes 49ms:
With this screenshot, where a segment has 75 samples and takes 9600ms:
I'd expect the 9600ms section to be about 9600/49=195 times larger, but instead the 49ms span is wider.
Here's an offcpu profile to reproduce with:
2021-09-17-200119506-obfuscated.firefox.json.gz
This was generated using Android simpleperf using off-cpu tracing 4000Hz for 30s:
Then converted to firefox profiler format.
┆Issue is synchronized with this Jira Task
The text was updated successfully, but these errors were encountered: