You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To understand this job at all we would need to see it as a pipeline or tree of subprocesses. This plot is already a timeline but we have no causaliy here and the vertical lines are making a real hash of it. What we want to see is that:
process A lived from T1 to T2
process A forked off process B which lived from T3 to T4 (normally T1 < T3 < T4 < T2 but it's possible for the parent to exit before the child so this is not a Law)
It's also clear with these long-running jobs that it is necessary to be able to zoom in on parts of the profile to make sense of anything, but I'll add a separate bug about that.
The existing profile command sort of manages to show this (try -fmt csv,cpu) because it has a timeline running down and a process line running across, and it's easy to see which processes are alive when and how much resources they use (well, easy: the plot is a thousand characters across and that's only because most slots are empty most of the time). But we have no parent/child information so we don't know who created whom. Of course, with 5-minute samples and this kind of fork frequency it's possible that large parts of the process graph are not visible to us anyway.
Going to label this "discussion" b/c I have no idea what we'd have to do to make more sense of these types of data.
The text was updated successfully, but these errors were encountered:
An alternative or complement to a timeline view is a hierarchy view: as a process tree. This process tree can't restrict itself to the unix view of processes: it should also view individual elements of an array as parts of the tree underneath the array job, for example (see #675).
I'm getting some vibes here from regular callgraph profiling, in which it's possible to drill down into big consumers (and ignore the small fry) but also where it's possible to invert the tree and focus on common leaf processes, which seems maybe relevant above. Or perhaps there is some other kind of grouping that makes sense.
Background:
To understand this job at all we would need to see it as a pipeline or tree of subprocesses. This plot is already a timeline but we have no causaliy here and the vertical lines are making a real hash of it. What we want to see is that:
It's also clear with these long-running jobs that it is necessary to be able to zoom in on parts of the profile to make sense of anything, but I'll add a separate bug about that.
The existing profile command sort of manages to show this (try -fmt csv,cpu) because it has a timeline running down and a process line running across, and it's easy to see which processes are alive when and how much resources they use (well, easy: the plot is a thousand characters across and that's only because most slots are empty most of the time). But we have no parent/child information so we don't know who created whom. Of course, with 5-minute samples and this kind of fork frequency it's possible that large parts of the process graph are not visible to us anyway.
Going to label this "discussion" b/c I have no idea what we'd have to do to make more sense of these types of data.
The text was updated successfully, but these errors were encountered: