-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(tracehouse): structure main thread task creation to be more forgiving #9491
Conversation
OK so turns out we already had an "invalid" trace checked into our fixtures and we just never noticed because the main thread task logic didn't check that the corresponding B/E event names matched, it just assumed they were valid. Now we check them and so we started throwing This PR now also allows up to 1ms slop time between parent/child events to allow for these minor mistakes and should hopefully reduce the number of fatal trace processing errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to review this, (read: I'm not super familiar with this code). It looks good, but I think it needs some more documentation to really make it understandable.
priorTaskData, | ||
timerInstallEventsReverseQueue | ||
); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to reconcile this one with the request for more comments and documentation, this is explicitly here so that the reader knows what the else branch would have been and why it's not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think something like, // If there's still a currentTask, that means we're currently in the middle of a task, so nextTask is a child.
up at the top of the if
works (and then can drop the else
)
currentTask.endTime = traceEndTs; | ||
currentTask = currentTask.parent; | ||
/** | ||
* This function takes the raw trace events sorted in increasing timestamp order and outputs connected task nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is very helpful for understanding this file, I wish it was higher up or there was some top level description of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a top level description of this file that addresses all that this file does while this comment discusses how this piece of the puzzle is computed (the only parts of this file that were touched were the trace parsing pieces).
It sounds like you think it might be worth splitting this file into different components?
I'm not quite sure what that would look like though. Maybe just...
- main-thread-tasks.js
- thread-task-creation.js
?
// Create a reversed copy of the array to avoid copying the rest of the queue on every mutation. | ||
const taskEndEventsReverseQueue = taskEndEvents.slice().reverse(); | ||
|
||
for (let i = 0; i < taskStartEvents.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of the nested loops where one side searches left to right, and then looks up right to left. Is there any way to iterate and build a map and lookup the seen that way? go from O(x^2) to O(x)? This might be overcomplicated for some code that is already complex.
^ That might not make 100% given this code is searching 2 separate lists and it is trying to handle nesting, but this definitely has a suspicious smell to the code. Might just need better comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was curious about the perf here too.
on a theverge trace (42MB), _createTasksFromStartAndEndEvents
takes 21ms and its parent _createTasksFromEvents
takes just 53ms.
meanwhile, recordsFromLogs takes 1380ms.
so i think right now this is still pretty cheap and not worth optimizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meanwhile, recordsFromLogs takes 1380ms.
Like network recordsFromLogs?? Even theverge will have max 100s of requests. What's taking 1.3s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while it's true the worst case runtime here is O(n^2), ~99.999% of the time the reverse loop is O(1) because the common case is that B/E events are in the correct order (think of every time we've run LH today without it throwing a fatal error, that is all of the cases that this reverse loop was O(1) on every event). The only reason we're doing this at all is to be able to handle the rare and unexpected case where Chrome put the events in an order that doesn't make sense. When that happens we always have to look at every event worst case so there's no way to escape O(n^2).
Also, what paul said, it's fine right now so seems premature to worry about taking up extra memory with a map.
Seems like a good idea to put all this into a comment somewhere :)
EDIT: I forgot about nested events when originally writing this but because we don't have stack sampling enabled the max depth should still be ~5, so still O(1) :D example: the average iterations of the inner loop on theverge was 1.1387605983318398
|
||
/* | ||
An artistic rendering of the below trace: | ||
█████████████████████████████TaskA██████████████████████████████████████████████ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love some of these renderings in the code as comments explaining what it is doing! 🎨
// Create a reversed copy of the array to avoid copying the rest of the queue on every mutation. | ||
const taskEndEventsReverseQueue = taskEndEvents.slice().reverse(); | ||
|
||
for (let i = 0; i < taskStartEvents.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was curious about the perf here too.
on a theverge trace (42MB), _createTasksFromStartAndEndEvents
takes 21ms and its parent _createTasksFromEvents
takes just 53ms.
meanwhile, recordsFromLogs takes 1380ms.
so i think right now this is still pretty cheap and not worth optimizing.
let matchedEventIndex = -1; | ||
let matchingNestedEventCount = 0; | ||
let matchingNestedEventIndex = i + 1; | ||
for (let j = taskEndEventsReverseQueue.length - 1; j >= 0; j--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we reversed this array it's unclear why we're iterating from the back to front.
this improves the perf of the pop() and splice()s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep pop() <<< shift()
added a comment to this effect and a better function explanation 👍
for (let j = taskEndEventsReverseQueue.length - 1; j >= 0; j--) { | ||
const endEvent = taskEndEventsReverseQueue[j]; | ||
// We are considering an end event, so we'll count how many nested events we saw along the way. | ||
while (matchingNestedEventIndex < taskStartEvents.length && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at https://cs.chromium.org/chromium/src/third_party/catapult/tracing/tracing/extras/importer/trace_event_importer.html (and slice_group) .. they use something very simple
basically since sync events are assumed to be sync... if we have an End, then it must match with the most recently found Begin event.
also interestingly they don't sort events or preprocess before doing this begin/end matching. perhaps making this assumption from the trace saves some headaches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried out a basic implementation of this approach but it fails on
should handle out-of-order 0 duration tasks
should handle child events that extend <1ms beyond parent event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps making this assumption from the trace saves some headaches?
It absolutely does save headaches and that's how the old version was written but that assumption is also what fails on several real-world traces. Recovering from that error situation is exactly what this rewrite is about, so if there's a way to get that old way to work and nicely recover from errors when that assumption breaks down I'm definitely open to someone else running with their vision for it :)
* @param {PriorTaskData} priorTaskData | ||
* @param {Array<LH.TraceEvent>} reverseEventsQueue | ||
*/ | ||
static _assignAllTimerInstallsBetweenTasks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there other test cases we should add for this new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already added
lighthouse/lighthouse-core/test/lib/tracehouse/main-thread-tasks-test.js
Lines 156 to 197 in 6daf898
it('should compute attributableURLs correctly across timers', () => { | |
const baseTs = 1241250325; | |
const url = s => ({args: {data: {url: s}}}); | |
const stackFrames = f => ({args: {data: {stackTrace: f.map(url => ({url}))}}}); | |
const timerId = id => ({args: {data: {timerId: id}}}); | |
/* | |
An artistic rendering of the below trace: | |
█████████████████████████████TaskA██████████████████████████████████████████████ | |
████████████████TaskB███████████████████ █Timer Fire█ | |
████EvaluateScript██████ █TaskE█ | |
| <-- Timer Install | |
*/ | |
const traceEvents = [ | |
...boilerplateTrace, | |
{ph: 'X', name: 'TaskA', pid, tid, ts: baseTs, dur: 100e3, ...url('about:blank')}, | |
{ph: 'B', name: 'TaskB', pid, tid, ts: baseTs + 5e3, ...stackFrames(['urlB.1', 'urlB.2'])}, | |
{ph: 'X', name: 'EvaluateScript', pid, tid, ts: baseTs + 10e3, dur: 30e3, ...url('urlC')}, | |
{ph: 'I', name: 'TimerInstall', pid, tid, ts: baseTs + 15e3, ...timerId(1)}, | |
{ph: 'E', name: 'TaskB', pid, tid, ts: baseTs + 55e3}, | |
{ph: 'X', name: 'TimerFire', pid, tid, ts: baseTs + 75e3, dur: 10e3, ...timerId(1)}, | |
{ph: 'X', name: 'TaskE', pid, tid, ts: baseTs + 80e3, dur: 5e3, ...stackFrames(['urlD'])}, | |
]; | |
traceEvents.forEach(evt => { | |
evt.cat = 'devtools.timeline'; | |
evt.args = evt.args || args; | |
}); | |
const tasks = run({traceEvents}); | |
const taskA = tasks.find(task => task.event.name === 'TaskA'); | |
const taskB = tasks.find(task => task.event.name === 'TaskB'); | |
const taskC = tasks.find(task => task.event.name === 'EvaluateScript'); | |
const taskD = tasks.find(task => task.event.name === 'TimerFire'); | |
const taskE = tasks.find(task => task.event.name === 'TaskE'); | |
expect(taskA.attributableURLs).toEqual([]); | |
expect(taskB.attributableURLs).toEqual(['urlB.1', 'urlB.2']); | |
expect(taskC.attributableURLs).toEqual(['urlB.1', 'urlB.2', 'urlC']); | |
expect(taskD.attributableURLs).toEqual(['urlB.1', 'urlB.2', 'urlC']); | |
expect(taskE.attributableURLs).toEqual(['urlB.1', 'urlB.2', 'urlC', 'urlD']); | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some general feedback, but this looks great. I like splitting the phases the construction phases a lot.
taskEndEvent = {ph: 'E', ts: traceEndTs}; | ||
} else if (matchedEventIndex === taskEndEventsReverseQueue.length - 1) { | ||
// Use .pop() in the common case where the immediately next event is needed. | ||
// It's ~25x faster, https://jsperf.com/pop-vs-splice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW for a trace that ends up ~70% popped vs 30% spliced, I see no performance difference eliminating the popped path altogether.
|
||
for (let i = 0; i < taskStartEvents.length; i++) { | ||
const taskStartEvent = taskStartEvents[i]; | ||
if (taskStartEvent.ph === 'X') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be clearer to handle this separately altogether instead of combining with the B
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the solution to this isn't clear to me, so I might need some help :)
what's clear to me now is that every start task case is handled here which is comforting compared to making sure I have my bases covered by combining the results of several methods if that's what the suggestion is? IMO, it's maybe a wash considering the mental overhead of this is only 3 lines? might be missing what the separately altogether looks like though 🤷♂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it wouldn't be much different, just something like if (event.ph === 'X') taskXEvents.push(event);
down in _createTasksFromEvents
and a separate function _createTasksFromXEvents
that's basically just this first three lines.
The main thing for me wasn't having this code here (though it would keep it slightly more focused), it was the matchingNestedEventIndex
business below, where I was trying to figure out if the name
check was sufficient to rule out X
events, if having them in there would mess up the matching ordering in taskEndEventsReverseQueue
, etc. All of that works fine, and there's no performance advantage, but it might be nice to separate them to ditch any mental overhead of having B and X together.
Not a huge deal, it also works as it is.
const nextTask = sortedTasks[i]; | ||
|
||
// Update `currentTask` based on the elapsed time. | ||
// The `nextTask` may be after currentTask has ended. | ||
while ( | ||
currentTask && | ||
Number.isFinite(currentTask.endTime) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is this an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the new construction method it should never be an issue when given a valid trace. Unfortunately I have no more confidence that this code will be given a valid trace these days 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(to be fair though, it should be much much rarer than before where dur
itself is not a finite number)
priorTaskData, | ||
timerInstallEventsReverseQueue | ||
); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think something like, // If there's still a currentTask, that means we're currently in the middle of a task, so nextTask is a child.
up at the top of the if
works (and then can drop the else
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
(with the really important lint fix) |
haha gg eslint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do it.
Summary
tl;dr - a rewrite of main-thread-tasks to make way for us dropping invalid trace events in the future
This PR converts our main thread creation to focus on individual tasks first, then moves on the hierarchy step. We were being fancy before by doing everything in a single pass, which is much more efficient and easier when the trace events are valid but next to impossible to understand if you're trying to recover from invalid data. By switching our approach, it's much easier to identify unmatched pairs of events, overextending tasks, etc.
It's mostly identical functionality-wise to our current setup with the exception of no longer requiring the logic in #9230 to handle same-timestamped events, so should be minimal impact as-is.The matching of B/E events now necessitates a more forgiving parent/child event timing rule so we handle missorted events and we allow 1ms slop for parent/child start/end times.The next step would be instead of
throw new Error
in main-thread-tasks it pushes the trace event onto aninvalidTraceEvents
array so that we can continue and the user still is informed there was something squirrely is going on.Related Issues/PRs
moves us a step closer to addressing #7764
eliminates the need for #9230 in order to address #7764 though as discussed in the PR description, still worth doing IMO as it sets this code up to follow the hot path and is just a more sensible way to receive trace events.