Increased failures due to 60s timeout on PSI/LR #7174

paulirish · 2019-02-06T20:09:48Z

We've seen an increase in our error rate on the LR backend:

(The red line is our render error rate)

It appears to be related to our latency, which has also increased some.

Since we have our .timings data available, I viewed that, which points to the major problem:

The chart here is the 95th percentile of each of these timings. lh:runner:auditing always remains flat, regardless of percentile. loadPage-defaultPass is the only timing to make a big jump. (Also this is a little tricky because any run that errors did NOT report these timings... so hypothetically a gatherer that takes 45s would never be visible to us.)

The loadPage jump is certainly happening regardless.. Here's that one timing in isolation, with the full heatmap:

Looking at the diff of LH changes that was in that push.. I suspect #6944 "core(driver): waitForFCP when tracing".. Plus we also know that NO_FCP is our most commonly seen LighthouseError, so I suspect more sites hitting the 35s maxWaitForLoad timeout means more hitting the 60s render timeout.

What can we do about this?

@brendankenny mentioned a potential threshold on how long we'd hold out for FCP.
@patrickhulce wdyt?

The text was updated successfully, but these errors were encountered:

patrickhulce · 2019-02-06T20:24:50Z

I also suspect waitForFCP here. Do we also see a drop in NO_FCP errors? I would hope and expect that we do.

If we do, then I'd say this is still a net positive.

If we don't, then it seems like all we've done when waiting for FCP is make the user wait longer to get the same error.

Either way, I like the idea of failing early after some maxWaitForFCP time that's shorter than maxWaitForLoad.

paulirish assigned patrickhulce Feb 6, 2019

patrickhulce mentioned this issue Feb 11, 2019

tests: re-organize driver tests by method #7212

Merged

paulirish added needs-priority P1 and removed needs-priority labels Feb 12, 2019

patrickhulce mentioned this issue Mar 1, 2019

core(driver): add waitForFCP timeout #7356

Merged

patrickhulce closed this as completed in #7356 Mar 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increased failures due to 60s timeout on PSI/LR #7174

Increased failures due to 60s timeout on PSI/LR #7174

paulirish commented Feb 6, 2019

patrickhulce commented Feb 6, 2019

Increased failures due to 60s timeout on PSI/LR #7174

Increased failures due to 60s timeout on PSI/LR #7174

Comments

paulirish commented Feb 6, 2019

patrickhulce commented Feb 6, 2019