Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flame Graph segment width seems to be weighted by sample count rather than time #3558

Open
mhansen opened this issue Sep 17, 2021 · 8 comments · May be fixed by #4320
Open

Flame Graph segment width seems to be weighted by sample count rather than time #3558

mhansen opened this issue Sep 17, 2021 · 8 comments · May be fixed by #4320
Labels

Comments

@mhansen
Copy link
Contributor

mhansen commented Sep 17, 2021

This comes up particularly with off-cpu profiling, where one stack sample (when a thread blocks on IO or mutex) might represent a large wall-clock duration.

Compare this screenshot, where a segment has 237 samples and takes 49ms:

image

With this screenshot, where a segment has 75 samples and takes 9600ms:

image

I'd expect the 9600ms section to be about 9600/49=195 times larger, but instead the 49ms span is wider.

Here's an offcpu profile to reproduce with:
2021-09-17-200119506-obfuscated.firefox.json.gz

This was generated using Android simpleperf using off-cpu tracing 4000Hz for 30s:

$ simpleperf/app_profiler.py -p com.google.android.apps.maps -r "-f 4000 -g -e cpu-clock --trace-offcpu --duration 30" --activity com.google.android.maps.MapsActivity --skip_collect_binaries

Then converted to firefox profiler format.

┆Issue is synchronized with this Jira Task

@mhansen
Copy link
Contributor Author

mhansen commented Sep 17, 2021

I suppose my first question should be: is this considered a bug? Perhaps there are reasons to weight by sample rather than by time? I'm struggling to think of reasons, but I don't want to rule it out.

It seems a bit confusing to me to have (in the first screenshot) it say 49.7ms (16%) when the overall profile is about 30s long -- 16% would be about 4.8s.

@julienw
Copy link
Contributor

julienw commented Sep 17, 2021

You're touching the (current) limitations of importing data from non-gecko sources, and of our history of being a sampling profiler dedicated to Gecko.

Historically the data comes from a sampling profiler. We were approximating the time spent with the simple formula time * sampleCount. And this is basically what you see in the flame graph currently.

But this doesn't work for the stack graph, where we need to fill the holes. So we had to come up with an algorithm to determine if we stay in a function over several samples. As a result we had a duration. But that duration was different than the simple formula obviously, especially when there were holes.

So we decided to use the same algorithm as the stack graph to compute this "tracing time", and use that in the tooltip, in addition to showing the values of "sample count" because this is what we saw really.

I believe we could use the "tracing time" as source for the flame graph, but would that be closer to the truth in case of sampling holes? I'm not so sure, we actually don't know what happens in a sampling hole.

Now fast forward to your issue. One problem is we never really took linux perf as a first-class data provider, so we made it work somewhat, with the assumption than our few users would know the limitations.

I believe there's another way to fix that. I don't know really what the source data looks like, you'll know better. If we do have durations for the samples, we could use the concept of "weight" that we don't use enough but that our various algorithms take into account:

/**
* Profile samples can come in a variety of forms and represent different information.
* The Gecko Profiler by default uses sample counts, as it samples on a fixed interval.
* These samples are all weighted equally by default, with a weight of one. However in
* comparison profiles, some weights are negative, creating a "diff" profile.
*
* In addition, tracing formats can fit into the sample-based format by reporting
* the "self time" of the profile. Each of these "self time" samples would then
* provide the weight, in duration. Currently, the tracing format assumes that
* the timing comes in milliseconds (see 'tracing-ms') but if needed, microseconds
* or nanoseconds support could be added.
*
* e.g. The following tracing data could be represented as samples:
*
* 0 1 2 3 4 5 6 7 8 9 10
* | | | | | | | | | | |
* - - - - - - - - - - -
* A A A A A A A A A A A
* B B D D D D
* C C E E E E
* .
* This chart represents the self time.
*
* 0 1 2 3 4 5 6 7 8 9 10
* | | | | | | | | | | |
* A A C C E E E E A A A
*
* And finally this is what the samples table would look like.
*
* SamplesTable = {
* time: [0, 2, 4, 8],
* stack: [A, ABC, ADE, A],
* weight: [2, 2, 4, 3],
* }
*/
export type WeightType = 'samples' | 'tracing-ms' | 'bytes';
type SamplesLikeTableShape = {
stack: Array<IndexIntoStackTable | null>,
time: Milliseconds[],
// An optional weight array. If not present, then the weight is assumed to be 1.
// See the WeightType type for more information.
weight: null | number[],
weightType: WeightType,
length: number,
};

With the caveat that it's not currently part of the sample data in the gecko format, this is only part of the sample data in the processed format. So either we could augment the gecko format to support it, or we can make the linux perf importer output a processed format instead of a gecko format. My prefered option would be the latter, but this is more work.

What do you think?

@mhansen
Copy link
Contributor Author

mhansen commented Sep 18, 2021 via email

@gregtatum
Copy link
Member

gregtatum commented Sep 20, 2021

We do :-) In the raw perf script output, we have cpu-clock data
(nanoseconds) associated with each sample.

I would recommend using the weight parameter here. If I were to approach this I would add it to the Gecko Profiler format, and then ensure that timing information in tooltips is using the correct name. I'm not sure its worth converting to the processed format just yet, but you can also add things like memory allocation information to the processed format.

SUM(traced timing)

This is the traced time. This is useful when we don't know the length of the sample, but is somewhat misleading due to gaps in sampling being from things like locks.

SUM(sample weight)

I would really like this to be fully supported and implemented well in the UI. It's really nice for imported profiles.

SUM(sample count * sampling interval)

I don't know that we should show this value, as it's so misleading. As someone who works on the profiler, I kind of like the value as it gives me some information I find useful, but as a profiler user I think this would be misleading.

@julienw julienw added the memory label Dec 3, 2021
@ryoqun ryoqun linked a pull request Nov 14, 2022 that will close this issue
@ryoqun
Copy link

ryoqun commented Nov 14, 2022

fyi, i created a very rough pr to fix this. at least, it should be working #4320

is anyone still interested? if so, i may prioritize finishing up the pr.

@julienw
Copy link
Contributor

julienw commented Nov 16, 2022

Yes, we're still interested! Thanks a lot for looking at this :-)

@mhansen
Copy link
Contributor Author

mhansen commented Nov 19, 2022 via email

@Waazer
Copy link

Waazer commented Nov 23, 2022

#لا نهتم بقيود

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants