Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_audit: add TTI Companion Metric to JSON #8975

Merged
merged 14 commits into from
Jun 5, 2019

Conversation

deepanjanroy
Copy link
Contributor

PTAL - this implements the Sum of queueing time beyond 50ms TTI Companion Metric. A few questions / things I want to call out:

  • We don't have a good name for this metric right now. Currently we're calling it Cumulative Long Queuing Delay (Cumulative instead of total/sum-of because we use the word 'cumulative' in 'cumulative layout jank'.) Didn't want to use the word 'jank' either because of how overloaded it is. If you have better ideas for the name, please let me know, but we can probably name it better later if we decide to ship it.
  • This is currently showing up on the report page. How do I stop doing that?
  • I haven't fixed the proto-test yet. Wanted to get some more high level feedback before I spent time fixing that.
  • Currently measuring it between FCP and TTI.
    • FCP as lower bound: Penalizing for long tasks before FCP felt counter-productive, because you should probably optimize for fast FCP and use up all the main thread you need to do that.
    • TTI as upper bound: There was very little different between TTI and TTI + 10s, so using TTI is because the tests can run faster. I wish we could remove the dependency on TTI, but don't have a good idea on how to do that.

@deepanjanroy
Copy link
Contributor Author

Also, looking at the pr title lint - what's a good title for this? report, metric, or misc?

@patrickhulce
Copy link
Collaborator

Awesome thanks so much for this @deepanjanroy!

If you have better ideas for the name, please let me know, but we can probably name it better later if we decide to ship it.

I believe @hoten come up with first ever pronounceable metric acronym if he'd like to share it :)

This is currently showing up on the report page. How do I stop doing that?

You should be able to just remove the group property in the config entry.

Penalizing for long tasks before FCP felt counter-productive

Big +1 👍

I wish we could remove the dependency on TTI, but don't have a good idea on how to do that.

IMO the only way we're going to be able to do this is something like the old fixed cost curve from eFID. That has the nice property that dragging out the trace doesn't end up affecting things much but tends to underpenalize bad experiences that happen very late, so I'm not sure how to resolve that issue really. On the one hand, pushing out the bad experience means people are indeed much less likely to experience it but if it can wait that long it probably should have just been broken up and run earlier in the first place soooooo... 🤷‍♂

pr title lint

I think we'll go with new_audit: add TTI companion to JSON or something similar that we did with Max potential FID at first? #5842

@deepanjanroy deepanjanroy changed the title Implement TTI Companion Metric new_audit: add TTI companion Metric to JSON May 16, 2019
@deepanjanroy deepanjanroy changed the title new_audit: add TTI companion Metric to JSON new_audit: add TTI Companion Metric to JSON May 16, 2019
@deepanjanroy
Copy link
Contributor Author

You should be able to just remove the group property in the config entry.
Done.

I wish we could remove the dependency on TTI, but don't have a good idea on how to do that.

IMO the only way we're going to be able to do this is something like the old fixed cost curve from eFID. That has the nice property that dragging out the trace doesn't end up affecting things much but tends to underpenalize bad experiences that happen very late, so I'm not sure how to resolve that issue really. On the one hand, pushing out the bad experience means people are indeed much less likely to experience it but if it can wait that long it probably should have just been broken up and run earlier in the first place soooooo...

Hm the downside of fixed cost curve is assuming the same distribution of all sites, and building a more sophisticated model (maybe taking into account other times when key big objects are first painted as opposed to just the first contentful paint - I think you suggested something like that) is not easy. For now I think taking it until TTI is probably the best tradeoff.

pr title lint

I think we'll go with new_audit: add TTI companion to JSON or something similar that we did with Max potential FID at first? #5842

Done - thanks!

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great @deepanjanroy awesome job navigating the confusing world of simulated/observed LH metrics! 🎉 👏 👏

const settings = {throttlingMethod: 'provided'};
const context = {options, settings, computedCache: new Map()};

return Audit.audit(artifacts, context).then(output => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: async/await and expect for new tests

though @brendankenny do you disagree with using expect for all new tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though @brendankenny do you disagree with using expect for all new tests?

no, it sounds good to me. I've (mostly) been matching the assert or expect used in existing test files just because I personally don't like when they're mixed, but totally 👍👍 for expect in new test files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realized I was mixing node assert and jest expects. Changed them all to jest expect since it has a nicer API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait sorry, still super nit but would prefer async/await` for new tests too :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops forgot to do async await last time. Done.


it('only looks at tasks within FMP and TTI', () => {
const events = [
// TODO(deepanjanroy@): Is there an interval data structure in lighthouse?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, not really at the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok. I'm going to punt on introducing that for now.

@connorjclark
Copy link
Collaborator

I believe @hoten come up with first ever pronounceable metric acronym if he'd like to share it :)

:)

Time In Long Tasks (TILT). It's not comprehensive, but it gets the point across.

@vercel vercel bot temporarily deployed to staging May 16, 2019 22:21 Inactive
@vercel vercel bot temporarily deployed to staging May 16, 2019 22:22 Inactive
@GoogleChrome GoogleChrome deleted a comment from googlebot May 20, 2019
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@deepanjanroy
Copy link
Contributor Author

Time In Long Tasks (TILT). It's not comprehensive, but it gets the point across.

I love the acronym, but I worry it's too easy to interpret is as sum of long task durations, as opposed to sum of times beyond 50ms. I'm currently calling the "time beyond 50ms" as "long queuing delay region", and so the sum of all these regions are "cumulative long queuing delay". If we can find a better name for the "time beyond 50ms in tasks", a better name for the whole metric should follow from there.

@deepanjanroy
Copy link
Contributor Author

Updated everything except the proto test (working on that) so please take another look.

As a tangent: I'm not sure if I've been spamming everyone here. I never used the new github review UI before, and I'm still figuring out how to batch comments properly; let me know if there are any conventions you follow.

@connorjclark
Copy link
Collaborator

I love the acronym, but I worry it's too easy to interpret is as sum of long task durations, as opposed to sum of times beyond 50ms. I'm currently calling the "time beyond 50ms" as "long queuing delay region", and so the sum of all these regions are "cumulative long queuing delay". If we can find a better name for the "time beyond 50ms in tasks", a better name for the whole metric should follow from there.

Thought about it a bit more... landed on an alternative to "long queuing delay region" - what about just "excessive"? Then we could do ETILT :) The excessive qualifies the time measurement, and suggests that it's not just a straight total.

As a tangent: I'm not sure if I've been spamming everyone here. I never used the new github review UI before, and I'm still figuring out how to batch comments properly; let me know if there are any conventions you follow.

Totally fine.


const UIStrings = {
title: 'Cumulative Long Queuing Delay',
description: '[Experimental metric]. Sum of Task Lengths beyond 50ms, between ' +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description seems ambiguous. Is it the sum of tasks that are beyond 50ms in length? Or the sum of task lengths minus 50ms?

Ideas:
Sum of task lengths that occur between FCP and TTI that delay additional task queuing by more than 50ms.
Or make it less technical?
Sum of tasks that delay additional task queuing. This is an estimate of task delay based on mainthread busyness. <- don't know if thats really true 😆 just thinking out loud for ideas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm you're right. Reworded to "[Experimental metric] Total time period between FCP and Time to Interactive during which queuing time for any input event would be higher than 50ms.' Is that clear enough?

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this LGTM from an HTTPArchive data gathering perspective! I'm super curious how the percentiles hold up there.

// was 0ms. These numbers include 404 pages, so rounding up the scoreMedian to 300ms and
// picking 25ms as PODR. See curve at https://www.desmos.com/calculator/x3nzenjyln
scoreMedian: 300,
scorePODR: 25,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some seriously fast pages 😮

// sites on mobile, 25-th percentile was 270ms, 10-th percentile was 22ms, and 5th percentile
// was 0ms. These numbers include 404 pages, so rounding up the scoreMedian to 300ms and
// picking 25ms as PODR. See curve at https://www.desmos.com/calculator/x3nzenjyln
scoreMedian: 300,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does feel a bit weird that a single 350 ms task immediately after FCP and total mainthread silence after that already gives you a 50

if we were to start breaking our 75/95 rule, I think I'd want to start it here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 600/200. We now have 600ms of jank = 0.5, and 100ms of jank = .999. 350 ms task will have 300ms of jank and get a .885.

const settings = {throttlingMethod: 'provided'};
const context = {options, settings, computedCache: new Map()};

return Audit.audit(artifacts, context).then(output => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait sorry, still super nit but would prefer async/await` for new tests too :)


const events = [
{start: 1000, end: 1110, duration: 110}, // Contributes 10ms.
{start: 2000, end: 2100, duration: 100}, // Contributes 50ms.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this only contribute 50 anyway, sans clipping?

I guess the whole idea of TTI occurring in the middle of a task isn't really possible though anyhow :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes it would. Changed a test case a little bit to test clipping better.

Copy link
Contributor Author

@deepanjanroy deepanjanroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments and fixed tests.

I made a bikeshedding doc here for the name of the metric (internal because we haven't had much public discussions around this metric yet). Feel free to edit directly and discuss there, but I don't want to block landing the metric over naming it since it's all very experimental right now.

Also, how do I make the cla/google check happy?

Please take a look!


const UIStrings = {
title: 'Cumulative Long Queuing Delay',
description: '[Experimental metric]. Sum of Task Lengths beyond 50ms, between ' +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm you're right. Reworded to "[Experimental metric] Total time period between FCP and Time to Interactive during which queuing time for any input event would be higher than 50ms.' Is that clear enough?

// sites on mobile, 25-th percentile was 270ms, 10-th percentile was 22ms, and 5th percentile
// was 0ms. These numbers include 404 pages, so rounding up the scoreMedian to 300ms and
// picking 25ms as PODR. See curve at https://www.desmos.com/calculator/x3nzenjyln
scoreMedian: 300,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 600/200. We now have 600ms of jank = 0.5, and 100ms of jank = .999. 350 ms task will have 300ms of jank and get a .885.

const settings = {throttlingMethod: 'provided'};
const context = {options, settings, computedCache: new Map()};

return Audit.audit(artifacts, context).then(output => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops forgot to do async await last time. Done.


const events = [
{start: 1000, end: 1110, duration: 110}, // Contributes 10ms.
{start: 2000, end: 2100, duration: 100}, // Contributes 50ms.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes it would. Changed a test case a little bit to test clipping better.

@deepanjanroy
Copy link
Contributor Author

Ping on this: Anything else I need to do for merging?

@patrickhulce
Copy link
Collaborator

Sorry @deepanjanroy for the silence! My approval still stands, ideally it gets a onceover from another "metricy" person on the team.

@brendankenny @paulirish any takers?

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some superficial style/nits feedback, but looks great to me.

const output = await Audit.audit(artifacts, context);

expect(output.numericValue).toBeCloseTo(48.3, 1);
expect(output.score).toBeCloseTo(1, 2);
Copy link
Member

@brendankenny brendankenny Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment here that it's expected to be // very nearly 1 or something if it really isn't expected to reach 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually does reach 1. Changed to .toBe(1).

static getEstimateFromSimulation(simulation, extras) {
// Intentionally use the opposite FCP estimate, a more pessimistic FCP means that more tasks are
// excluded from the CumulativeLongQueuingDelay computation, so a higher FCP means lower value
// for the same work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets a bit confusing. What about something like

    // Intentionally use the opposite FCP estimate. A pessimistic FCP is higher than an optimistic FCP,
    // which means more tasks are excluded from the CumulativeLongQueuingDelay computation. So a more
    // pessimistic FCP gives a more optimistic CumulativeLongQueuingDelay for the same work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pessimistic FCP is higher than an optimistic FCP

is that strictly true? I don't know :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that strictly true?

pessimistic FCP is strictly higher than or equal to optimistic FCP :)

It includes everything in optimistic plus the script initiated render-blocking priority requests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded this comment (and the comment below about TTI.)

: extras.interactiveResult.pessimisticEstimate.timeInMs;

// Require here to resolve circular dependency.
const CumulativeLongQueuingDelay = require('./cumulative-long-queuing-delay.js');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move down to above the return statement`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the CumulativeLongQueuingDelay.LONG_QUEUING_DELAY_THRESHOLD constant unfortunately, so this is the furthest I can move it down.

});
}

return events.sort((a, b) => a.start - b.start);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes some sense to always be sorted, but if it really is a perf concern, I don't think anything depends on it being that way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right - this sorting is not necessary. I remember I was doing the clipping in a different way initially which relied on sorting, but not anymore, and I forgot to change this. Removed the sorting.

@deepanjanroy
Copy link
Contributor Author

Addressed comments. Do I need to do anything about the cla/google error?

@brendankenny
Copy link
Member

Do I need to do anything about the cla/google error?

No, that's an annoying thing where it doesn't always recognize our changes through the github "suggestion" UI as coming from us, even though the author and committer are you and the Co-Authored-By is someone with a cla...

Still LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants