Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased failures due to 60s timeout on PSI/LR #7174

Closed
paulirish opened this issue Feb 6, 2019 · 1 comment · Fixed by #7356
Closed

Increased failures due to 60s timeout on PSI/LR #7174

paulirish opened this issue Feb 6, 2019 · 1 comment · Fixed by #7356
Assignees
Labels

Comments

@paulirish
Copy link
Member

We've seen an increase in our error rate on the LR backend:
image
(The red line is our render error rate)

It appears to be related to our latency, which has also increased some.

Since we have our .timings data available, I viewed that, which points to the major problem:
image

The chart here is the 95th percentile of each of these timings. lh:runner:auditing always remains flat, regardless of percentile. loadPage-defaultPass is the only timing to make a big jump. (Also this is a little tricky because any run that errors did NOT report these timings... so hypothetically a gatherer that takes 45s would never be visible to us.)

The loadPage jump is certainly happening regardless.. Here's that one timing in isolation, with the full heatmap:
image

Looking at the diff of LH changes that was in that push.. I suspect #6944 "core(driver): waitForFCP when tracing".. Plus we also know that NO_FCP is our most commonly seen LighthouseError, so I suspect more sites hitting the 35s maxWaitForLoad timeout means more hitting the 60s render timeout.


What can we do about this?

@brendankenny mentioned a potential threshold on how long we'd hold out for FCP.
@patrickhulce wdyt?

@patrickhulce
Copy link
Collaborator

I also suspect waitForFCP here. Do we also see a drop in NO_FCP errors? I would hope and expect that we do.

If we do, then I'd say this is still a net positive.

If we don't, then it seems like all we've done when waiting for FCP is make the user wait longer to get the same error.

Either way, I like the idea of failing early after some maxWaitForFCP time that's shorter than maxWaitForLoad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants