Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(tracehouse): sort trace events by nesting order #9230

Merged
merged 6 commits into from
Aug 5, 2019

Conversation

patrickhulce
Copy link
Collaborator

@patrickhulce patrickhulce commented Jun 18, 2019

Context
We expect trace events to be in the correct order when we try to build out our picture of main thread activity (main-thread-tasks, page-dependency-graph, etc). Up until today, we thought that meant just stable sorting by .ts based on previous conversations with tracing folks. Then @aslushnikov discovered that we should also be sorting by dur (aslushnikov/tracium#11). This isn't the full story though because that only covers X events and this situation pops up with interleaved B/X events as well.

This PR sorts our trace events by .ts and then breaks ties by sorting in the correct parent/child nesting order for parsing in main-thread-tasks rather than just stable sort.

I've convinced myself by this point that we can't solve the problem just this way though. There are traces in the wild that simply do not have a possible sorted order that is logically consistent and if we ever want to handle those we must make our trace event consumers robust enough to handle inconsistency. (If you've made it this far in the wall of text, enjoy this cookie 🍪:D) If we're doing that then there' no strong reason to have the duplicate-ish logic of robust event handling here, and we could continue to sort by timestamp and have that be the only invariant to traceEvents coming out of traceOfTab if folks prefer that approach.

Questions

  • Do we want to support these incoherent cases at all? IMO, yes.
  • If we do, should we try sorting the events in the order we thought we were receiving them like this PR does? IMO, yes.

Related Issues/PRs
aslushnikov/tracium#8
aslushnikov/tracium#11
#7764

@brendankenny
Copy link
Member

There are traces in the wild that simply do not have a possible sorted order that is logically consistent

that doesn't really seem possible with just B/E/X events, unless you mean sorted order + additional constraints we know to be true? Nesting? Or more than that

(currently actually reviewing but was curious :)

@patrickhulce
Copy link
Collaborator Author

that doesn't really seem possible with just B/E/X events

Hahaha, well that's exactly the problem!

A child shouldn't possibly end after its parent, but sometimes in the wild the timestamps we're given indicate this.

Example:

Parent Event (1 X Event)
0 ------------------------------------- 100
Child Event (1 B + 1 E Event)
                50 -------------------- 101

Parent Event (1 X Event)
0 ------------------------------------- 100
Child Event (1 B + 1 E Event)
0 -------------------- 50

No amount of sorting the first 3 events will ever lead to an interpretation that is logically consistent with how traces/tasks should work, but situations like next set of 3 events can be solved by sorting events in the correct nesting order (with X event starting the task first).

@patrickhulce
Copy link
Collaborator Author

I haven't been pushing this because of the extra need for consumption changes, but a call either way if we're pursuing this is necessary for the next phase, so input welcome :)

@brendankenny
Copy link
Member

From my quick review a month ago this seemed like the right call. Even if we still have to do more work in trace consumers, at least this level of event sorting will be correct.

Thanks for your extreme patience :) let's get it back in the active queue

@patrickhulce
Copy link
Collaborator Author

What are the next steps to receive feedback in the active queue? :)

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. The performance characteristics worry me a bit, but looking at some real traces, so few events go through this stage that it's not too bad at all. Current PR is about 38% slower than master on an example verge trace, trying out some ideas I mention below gets that down to about 23%, but we're talking 100 milliseconds so the absolute difference isn't very big.

@@ -66,10 +66,68 @@ class TraceProcessor {
}

/**
* This method sorts a group of trace events that have the same timestamp. We want to...
*
* 1. Put E events first, we finish off our existing events before we start new ones.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a B/E pair ever have the same ts? If so, this would misorder them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, there's no other way for us to know at this stage either because we now know the order they came in doesn't really mean anything. We'll have to handle this on consumption side like we do with the other problematic cases then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, there's no other way for us to know at this stage either because we now know the order they came in doesn't really mean anything. We'll have to handle this on consumption side like we do with the other problematic cases then.

hmm, but in that case wouldn't just ordering them B then E fix it? The hard part would be knowing how to connect the two

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kinda gets at the fact that there are still lots of edge cases where we have to change our consumption. this won't particularly matter with the PR I'm about to open so I have no strong feelings here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a B/E pair ever have the same ts?

so I instrumented this code a bit and ran it over a good set of gnarly traces and never found any instance where this happened if we differentiate by event name and pid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I instrumented this code a bit and ran it over a good set of gnarly traces and never found any instance where this happened if we differentiate by event name and pid.

Actually, lighthouse-core/test/fixtures/traces/tracingstarted-after-navstart.json does fail at this, but unclear if it's from us editing it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conflict should be resolved by #9491 + this existing work, so I would vote to punt, it's already captured in a test in #9491 👍

lighthouse-core/lib/tracehouse/trace-processor.js Outdated Show resolved Hide resolved
lighthouse-core/lib/tracehouse/trace-processor.js Outdated Show resolved Hide resolved
lighthouse-core/lib/tracehouse/trace-processor.js Outdated Show resolved Hide resolved
lighthouse-core/lib/tracehouse/trace-processor.js Outdated Show resolved Hide resolved
@@ -66,10 +66,68 @@ class TraceProcessor {
}

/**
* This method sorts a group of trace events that have the same timestamp. We want to...
*
* 1. Put E events first, we finish off our existing events before we start new ones.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a B/E pair ever have the same ts?

so I instrumented this code a bit and ran it over a good set of gnarly traces and never found any instance where this happened if we differentiate by event name and pid.

lighthouse-core/lib/tracehouse/trace-processor.js Outdated Show resolved Hide resolved
lighthouse-core/lib/tracehouse/trace-processor.js Outdated Show resolved Hide resolved
Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is going to be a great improvement

* 1. Put E events first, we finish off our existing events before we start new ones.
* 2. Order B/X events by their duration, we want parents to start before child events.
* 3. If we don't have any of this to go on, just use the position in the original array (stable sort).
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth calling out for future readers that this is expected to usually get tsGroupIndices lengths on the order of 10 (and often closer to 1?). Obviously future readers can use profilers, too, but it does jump out as something pretty heavyweight to run over a whole trace when actually it's run over a pretty small part of a trace

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good call, it's a good reminder we'll all need for the next poor sap to touch trace events 👍 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants