Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(tracehouse): structure main thread task creation to be more forgiving #9491

Merged
merged 11 commits into from
Sep 26, 2019

Conversation

patrickhulce
Copy link
Collaborator

@patrickhulce patrickhulce commented Jul 31, 2019

Summary
tl;dr - a rewrite of main-thread-tasks to make way for us dropping invalid trace events in the future

This PR converts our main thread creation to focus on individual tasks first, then moves on the hierarchy step. We were being fancy before by doing everything in a single pass, which is much more efficient and easier when the trace events are valid but next to impossible to understand if you're trying to recover from invalid data. By switching our approach, it's much easier to identify unmatched pairs of events, overextending tasks, etc.

It's mostly identical functionality-wise to our current setup with the exception of no longer requiring the logic in #9230 to handle same-timestamped events, so should be minimal impact as-is. The matching of B/E events now necessitates a more forgiving parent/child event timing rule so we handle missorted events and we allow 1ms slop for parent/child start/end times.

The next step would be instead of throw new Error in main-thread-tasks it pushes the trace event onto an invalidTraceEvents array so that we can continue and the user still is informed there was something squirrely is going on.

Related Issues/PRs
moves us a step closer to addressing #7764

eliminates the need for #9230 in order to address #7764 though as discussed in the PR description, still worth doing IMO as it sets this code up to follow the hot path and is just a more sensible way to receive trace events.

@patrickhulce
Copy link
Collaborator Author

patrickhulce commented Aug 5, 2019

OK so turns out we already had an "invalid" trace checked into our fixtures and we just never noticed because the main thread task logic didn't check that the corresponding B/E event names matched, it just assumed they were valid. Now we check them and so we started throwing Child cannot end after parent errors.

This PR now also allows up to 1ms slop time between parent/child events to allow for these minor mistakes and should hopefully reduce the number of fatal trace processing errors.

@patrickhulce patrickhulce changed the title core(tracehouse): structure main thread creation to be more forgiving core(tracehouse): structure main thread task creation to be more forgiving Aug 6, 2019
@paulirish paulirish added the 5.3 label Aug 27, 2019
Copy link
Member

@exterkamp exterkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to review this, (read: I'm not super familiar with this code). It looks good, but I think it needs some more documentation to really make it understandable.

priorTaskData,
timerInstallEventsReverseQueue
);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to reconcile this one with the request for more comments and documentation, this is explicitly here so that the reader knows what the else branch would have been and why it's not necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think something like, // If there's still a currentTask, that means we're currently in the middle of a task, so nextTask is a child. up at the top of the if works (and then can drop the else)

currentTask.endTime = traceEndTs;
currentTask = currentTask.parent;
/**
* This function takes the raw trace events sorted in increasing timestamp order and outputs connected task nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is very helpful for understanding this file, I wish it was higher up or there was some top level description of this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a top level description of this file that addresses all that this file does while this comment discusses how this piece of the puzzle is computed (the only parts of this file that were touched were the trace parsing pieces).

It sounds like you think it might be worth splitting this file into different components?

I'm not quite sure what that would look like though. Maybe just...

  • main-thread-tasks.js
  • thread-task-creation.js

?

// Create a reversed copy of the array to avoid copying the rest of the queue on every mutation.
const taskEndEventsReverseQueue = taskEndEvents.slice().reverse();

for (let i = 0; i < taskStartEvents.length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of the nested loops where one side searches left to right, and then looks up right to left. Is there any way to iterate and build a map and lookup the seen that way? go from O(x^2) to O(x)? This might be overcomplicated for some code that is already complex.

^ That might not make 100% given this code is searching 2 separate lists and it is trying to handle nesting, but this definitely has a suspicious smell to the code. Might just need better comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was curious about the perf here too.

on a theverge trace (42MB), _createTasksFromStartAndEndEvents takes 21ms and its parent _createTasksFromEvents takes just 53ms.

meanwhile, recordsFromLogs takes 1380ms.

so i think right now this is still pretty cheap and not worth optimizing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meanwhile, recordsFromLogs takes 1380ms.

Like network recordsFromLogs?? Even theverge will have max 100s of requests. What's taking 1.3s?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findNetworkQuietPeriods

image

Copy link
Collaborator Author

@patrickhulce patrickhulce Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while it's true the worst case runtime here is O(n^2), ~99.999% of the time the reverse loop is O(1) because the common case is that B/E events are in the correct order (think of every time we've run LH today without it throwing a fatal error, that is all of the cases that this reverse loop was O(1) on every event). The only reason we're doing this at all is to be able to handle the rare and unexpected case where Chrome put the events in an order that doesn't make sense. When that happens we always have to look at every event worst case so there's no way to escape O(n^2).

Also, what paul said, it's fine right now so seems premature to worry about taking up extra memory with a map.

Seems like a good idea to put all this into a comment somewhere :)

EDIT: I forgot about nested events when originally writing this but because we don't have stack sampling enabled the max depth should still be ~5, so still O(1) :D example: the average iterations of the inner loop on theverge was 1.1387605983318398

lighthouse-core/lib/tracehouse/main-thread-tasks.js Outdated Show resolved Hide resolved

/*
An artistic rendering of the below trace:
█████████████████████████████TaskA██████████████████████████████████████████████
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love some of these renderings in the code as comments explaining what it is doing! 🎨

// Create a reversed copy of the array to avoid copying the rest of the queue on every mutation.
const taskEndEventsReverseQueue = taskEndEvents.slice().reverse();

for (let i = 0; i < taskStartEvents.length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was curious about the perf here too.

on a theverge trace (42MB), _createTasksFromStartAndEndEvents takes 21ms and its parent _createTasksFromEvents takes just 53ms.

meanwhile, recordsFromLogs takes 1380ms.

so i think right now this is still pretty cheap and not worth optimizing.

let matchedEventIndex = -1;
let matchingNestedEventCount = 0;
let matchingNestedEventIndex = i + 1;
for (let j = taskEndEventsReverseQueue.length - 1; j >= 0; j--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we reversed this array it's unclear why we're iterating from the back to front.

this improves the perf of the pop() and splice()s?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep pop() <<< shift() added a comment to this effect and a better function explanation 👍

for (let j = taskEndEventsReverseQueue.length - 1; j >= 0; j--) {
const endEvent = taskEndEventsReverseQueue[j];
// We are considering an end event, so we'll count how many nested events we saw along the way.
while (matchingNestedEventIndex < taskStartEvents.length &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at https://cs.chromium.org/chromium/src/third_party/catapult/tracing/tracing/extras/importer/trace_event_importer.html (and slice_group) .. they use something very simple
basically since sync events are assumed to be sync... if we have an End, then it must match with the most recently found Begin event.

also interestingly they don't sort events or preprocess before doing this begin/end matching. perhaps making this assumption from the trace saves some headaches?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried out a basic implementation of this approach but it fails on

should handle out-of-order 0 duration tasks
should handle child events that extend <1ms beyond parent event

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps making this assumption from the trace saves some headaches?

It absolutely does save headaches and that's how the old version was written but that assumption is also what fails on several real-world traces. Recovering from that error situation is exactly what this rewrite is about, so if there's a way to get that old way to work and nicely recover from errors when that assumption breaks down I'm definitely open to someone else running with their vision for it :)

* @param {PriorTaskData} priorTaskData
* @param {Array<LH.TraceEvent>} reverseEventsQueue
*/
static _assignAllTimerInstallsBetweenTasks(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there other test cases we should add for this new method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already added

it('should compute attributableURLs correctly across timers', () => {
const baseTs = 1241250325;
const url = s => ({args: {data: {url: s}}});
const stackFrames = f => ({args: {data: {stackTrace: f.map(url => ({url}))}}});
const timerId = id => ({args: {data: {timerId: id}}});
/*
An artistic rendering of the below trace:
█████████████████████████████TaskA██████████████████████████████████████████████
████████████████TaskB███████████████████ █Timer Fire█
████EvaluateScript██████ █TaskE█
| <-- Timer Install
*/
const traceEvents = [
...boilerplateTrace,
{ph: 'X', name: 'TaskA', pid, tid, ts: baseTs, dur: 100e3, ...url('about:blank')},
{ph: 'B', name: 'TaskB', pid, tid, ts: baseTs + 5e3, ...stackFrames(['urlB.1', 'urlB.2'])},
{ph: 'X', name: 'EvaluateScript', pid, tid, ts: baseTs + 10e3, dur: 30e3, ...url('urlC')},
{ph: 'I', name: 'TimerInstall', pid, tid, ts: baseTs + 15e3, ...timerId(1)},
{ph: 'E', name: 'TaskB', pid, tid, ts: baseTs + 55e3},
{ph: 'X', name: 'TimerFire', pid, tid, ts: baseTs + 75e3, dur: 10e3, ...timerId(1)},
{ph: 'X', name: 'TaskE', pid, tid, ts: baseTs + 80e3, dur: 5e3, ...stackFrames(['urlD'])},
];
traceEvents.forEach(evt => {
evt.cat = 'devtools.timeline';
evt.args = evt.args || args;
});
const tasks = run({traceEvents});
const taskA = tasks.find(task => task.event.name === 'TaskA');
const taskB = tasks.find(task => task.event.name === 'TaskB');
const taskC = tasks.find(task => task.event.name === 'EvaluateScript');
const taskD = tasks.find(task => task.event.name === 'TimerFire');
const taskE = tasks.find(task => task.event.name === 'TaskE');
expect(taskA.attributableURLs).toEqual([]);
expect(taskB.attributableURLs).toEqual(['urlB.1', 'urlB.2']);
expect(taskC.attributableURLs).toEqual(['urlB.1', 'urlB.2', 'urlC']);
expect(taskD.attributableURLs).toEqual(['urlB.1', 'urlB.2', 'urlC']);
expect(taskE.attributableURLs).toEqual(['urlB.1', 'urlB.2', 'urlC', 'urlD']);
});
which is enough to cover the case I had in mind with this method but maybe there's another case I'm missing?

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some general feedback, but this looks great. I like splitting the phases the construction phases a lot.

lighthouse-core/lib/tracehouse/main-thread-tasks.js Outdated Show resolved Hide resolved
taskEndEvent = {ph: 'E', ts: traceEndTs};
} else if (matchedEventIndex === taskEndEventsReverseQueue.length - 1) {
// Use .pop() in the common case where the immediately next event is needed.
// It's ~25x faster, https://jsperf.com/pop-vs-splice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW for a trace that ends up ~70% popped vs 30% spliced, I see no performance difference eliminating the popped path altogether.


for (let i = 0; i < taskStartEvents.length; i++) {
const taskStartEvent = taskStartEvents[i];
if (taskStartEvent.ph === 'X') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be clearer to handle this separately altogether instead of combining with the B case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the solution to this isn't clear to me, so I might need some help :)

what's clear to me now is that every start task case is handled here which is comforting compared to making sure I have my bases covered by combining the results of several methods if that's what the suggestion is? IMO, it's maybe a wash considering the mental overhead of this is only 3 lines? might be missing what the separately altogether looks like though 🤷‍♂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it wouldn't be much different, just something like if (event.ph === 'X') taskXEvents.push(event); down in _createTasksFromEvents and a separate function _createTasksFromXEvents that's basically just this first three lines.

The main thing for me wasn't having this code here (though it would keep it slightly more focused), it was the matchingNestedEventIndex business below, where I was trying to figure out if the name check was sufficient to rule out X events, if having them in there would mess up the matching ordering in taskEndEventsReverseQueue, etc. All of that works fine, and there's no performance advantage, but it might be nice to separate them to ditch any mental overhead of having B and X together.

Not a huge deal, it also works as it is.

const nextTask = sortedTasks[i];

// Update `currentTask` based on the elapsed time.
// The `nextTask` may be after currentTask has ended.
while (
currentTask &&
Number.isFinite(currentTask.endTime) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the new construction method it should never be an issue when given a valid trace. Unfortunately I have no more confidence that this code will be given a valid trace these days 😢

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to be fair though, it should be much much rarer than before where dur itself is not a finite number)

lighthouse-core/lib/tracehouse/main-thread-tasks.js Outdated Show resolved Hide resolved
lighthouse-core/lib/tracehouse/main-thread-tasks.js Outdated Show resolved Hide resolved
priorTaskData,
timerInstallEventsReverseQueue
);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think something like, // If there's still a currentTask, that means we're currently in the middle of a task, so nextTask is a child. up at the top of the if works (and then can drop the else)

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@brendankenny
Copy link
Member

(with the really important lint fix)

@brendankenny
Copy link
Member

haha gg eslint

Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants