Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(driver): use last frameNavigated event to determine finalUrl #10339

Merged
merged 11 commits into from
May 4, 2020

Conversation

patrickhulce
Copy link
Collaborator

@patrickhulce patrickhulce commented Feb 14, 2020

Summary
Split out just the driver client-side redirect URL tracking bits out from #10325

  1. Use window.location URL of last frameNavigted event of the main frame for finalUrl in driver.
  2. Add ultrabasic client-side redirect smoke tests (I originally fixed redirects audit in this PR too but that ballooned into ~500 lines of changes, so will immediately follow up with that)

Related Issues/PRs
#8984
#10325

@patrickhulce patrickhulce requested a review from a team as a code owner February 14, 2020 04:16
@patrickhulce patrickhulce requested review from paulirish and removed request for a team February 14, 2020 04:16
finalUrl: 'http://localhost:10200/redirects-final.html',
audits: {
'redirects': {
score: '<1',
numericValue: '>=500',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this numericValue was misleading because it has nothing to do with delay it's lantern-based, future followup resolves this issue in redirects audit, but just reorganizing the checks here first.

@@ -439,6 +439,10 @@ function _resolveIcuMessageInstanceId(icuMessageInstanceId, locale) {
const [_, icuMessageId, icuMessageInstanceIndex] = matches;
const icuMessageInstances = _icuMessageInstanceMap.get(icuMessageId) || [];
const icuMessageInstance = icuMessageInstances[Number(icuMessageInstanceIndex)];
if (!icuMessageInstance) {
Copy link
Collaborator Author

@patrickhulce patrickhulce Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a drive-by for fixing -G/-A which is currently broken because of messages generated during gathering that can't be replaced in auditing phase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies but I can't recall if it'd be fine to land this hotfix now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brendan's about to fix anyway i'll revert 👍

@@ -439,6 +439,10 @@ function _resolveIcuMessageInstanceId(icuMessageInstanceId, locale) {
const [_, icuMessageId, icuMessageInstanceIndex] = matches;
const icuMessageInstances = _icuMessageInstanceMap.get(icuMessageId) || [];
const icuMessageInstance = icuMessageInstances[Number(icuMessageInstanceIndex)];
if (!icuMessageInstance) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!!

@@ -1107,7 +1107,13 @@ class Driver {
// Bring `Page.navigate` errors back into the promise chain. See https://github.com/GoogleChrome/lighthouse/pull/6739.
await waitforPageNavigateCmd;

return this._endNetworkStatusMonitoring();
const finalUrlViaNetwork = this._endNetworkStatusMonitoring();
const finalUrlViaScript = await this.evaluateAsync('window.location.href', {useIsolation: true})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can include #hashes which aren't in the networkUrl. first inclination is to strip out the hash, but i'm not sure

also this will be influenced by pushState/replaceState

Copy link
Collaborator

@connorjclark connorjclark Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashes are an important part of some client side routers. I think it should remain. altho, the warning we make when initial/final urls aren't equal might be noisy..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what equalWithExcludedFragments is for folks :) we already use it for the warning, findMainDocument and half a dozen or so audits. This might indeed start to reveal more places where we should've been using it but got lucky, but that seems like healthy improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also discussed pushState with real /path/ urls (not hashes). Do all of our uses of finalUrl expect the url may not be a real network resource?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all of our uses of finalUrl expect the url may not be a real network resource?

We usually (always?) assume the opposite. I could see it being worth making a distinction between finalNavigatedUrl vs finalDisplayedUrl, but it seems like the final navigated url is the one we would do anything programmatic with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickhulce you've had me in suspense about what you were thinking about this since that meeting :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, was this the extent of your concerns? lol I was waiting for your feedback from the meeting to respond 😆

The real long-term solution requires a reexamination of every audit that requests the main resource and deciding whether it should be actually using the root request, the intermediate document, or the final document. Ain't nobody got time for that right now, so I see a few options for what we do right now:

  • a simple change to findMainDocument that maintains the current behavior with this PR to just use the last network redirect URL when finalUrl is passed in (~2 line change to this PR, but mostly as-is, findMainDocument usage will be confusing but it's already a mess we need to start sorting out ASAP for 7.0)
  • more conservative, use the window.location.href as Brendan's suggested finalDisplayedUrl in a separate property (a bit of plumbing to thread this URL through everywhere without impacting existing finalUrl usage)
  • even more conservative, use the window.location.href for the warning only but we don't show it or expose it anywhere else

More details on my proposal
Currently the final URL is always the last network redirect URL. If we want to maintain this behavior for now regardless of if its correctness, then we can just ignore the finalUrl in findMainDocument and follow the redirectDestination of the root request when the argument is present.

I maintain that the current behavior isn't actually correct though, so long term we want to re-examine all of our uses of MainResource Let's analyze a few of the situations we could face here...

Ways that window.location.href could differ from the initialUrl

  • Network redirects
  • Client-side redirects
  • Push State
  • Any combination of the above

Ways that Lighthouse could fail

  • Fails to display the real finalUrl / add the warning
  • Fails to use the first request for performance metrics
  • Fails to use the correct request in individual audits

Operation Modes to Consider

  • Lantern
  • Observed

Audits using MainResource directly

  • legacy-javascript
  • performance-budget
  • redirects
  • third-party-summary
  • ttfb
  • uses-rel-preconnect
  • uses-rel-preload
  • charset
  • canonical
  • http-status-code
  • is-crawlable
  • critical-request-chains

Audits using findMainDocument

  • diagnostics
  • duplicated-javascript
  • all performance metrics

Gatherers using MainResource/findMainDocument

  • ScriptElements
  • MainDocumentContent
  • LinkElements
  • getPageLoadError (in gather runner)

For warnings today, we only handle network redirects correctly.
For performance metrics today, we only handle lantern and network redirects correctly.
For audits today, we handle mixed cases correctly, most are incorrect for client-side redirects IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the final URL is always the last network redirect URL. If we want to maintain this behavior for now regardless of if its correctness, then we can just ignore the finalUrl in findMainDocument and follow the redirectDestination of the root request when the argument is present.

I don't think anyone thinks we shouldn't track js redirects (see #8984 :) and I don't think anyone in that meeting (at least anyone talking) didn't agree that we should progress on this for 6.0.

Assuming that :), my question above was attempting to ask whether there is anywhere in lighthouse that cares about window.location.href vs the url of the last navigation request (which may go beyond the last redirectDestination if js redirects occurred) except the url displayed at the top of the report? If not, it seems like we should use that everywhere and refine audits, rather than starting with window.location.href and refine the other way.

even more conservative, use the window.location.href for the warning only but we don't show it or expose it anywhere else

I feel like the redirect warning is a bit of a red herring for history pushstate changes in particular, because do we even want to warn if it only differs for that reason? Part of the problem is no one recorded what the purpose of the warning is :)

#1917 was originally warning about cross-origin redirects because storage clearing is per origin, so a cross-origin redirect means storage may not be cleared for where the test page actually is and it may not actually be a cold load, which might have a significant effect. I'm not sure what else the warning is doing...just "fyi, something unexpected might be going on"? Reducing wasted time spent redirecting might be important for PSI, but it's not clear why the other clients should really care (devtools usually shouldn't even get redirected, except maybe a single example.com -> m.example.com change)

Copy link
Collaborator Author

@patrickhulce patrickhulce Apr 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my question above was attempting to ask whether there is anywhere in lighthouse that cares about window.location.href vs the url of the last navigation request

Ah, I misunderstood I thought you were concerned about changing the main resource that audits received. Essentially you're proposing that instead of window.location.href we take the last document request for a main frame that matches a navigation (and perhaps the origin of window.location.href)?

That intermediate step makes sense to me 👍

I'm not sure what else the warning is doing

I believe this resurfaced during bug triage. There are lots of cases where perf gets heavily influenced by redirects (especially in PSI as people type in random stuff without www/protocols/m./etc) where it's crucial to warn when then URL changes and if we warn there why not just warn everywhere we saw a change because it matters to non-PSI folks too?

do we even want to warn if it only differs for that reason

IMO, yes. Anytime that you're attempting to ask for the performance of X and you instead get the performance of some journey X -> Y, you should be notified that you didn't get just X and that's why numbers might be unexpected. Similar to but not as severe as the reason we fail on server errors, hence the error -> warn severity change :)

EDIT: and prime support for warning on the above is the exact case that triggered a lot of this in the first place, a massive 35325MB app bundle that just redirects to the login page could easily on an SPA be a pushState and XHR instead of document navigation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aight, I hope I understood because the deed is done :)

@patrickhulce
Copy link
Collaborator Author

FYI @paulirish I'm interpreting those initial comments as cursory and I am currently waiting on a review. Just let me know if you think otherwise :)

lighthouse-cli/test/fixtures/js-redirect.html Show resolved Hide resolved
lighthouse-core/gather/driver.js Show resolved Hide resolved
@@ -439,6 +439,10 @@ function _resolveIcuMessageInstanceId(icuMessageInstanceId, locale) {
const [_, icuMessageId, icuMessageInstanceIndex] = matches;
const icuMessageInstances = _icuMessageInstanceMap.get(icuMessageId) || [];
const icuMessageInstance = icuMessageInstances[Number(icuMessageInstanceIndex)];
if (!icuMessageInstance) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies but I can't recall if it'd be fine to land this hotfix now.

@patrickhulce patrickhulce changed the title core(driver): use window.location for finalUrl core(driver): use last frameNavigated event to determine finalUrl Apr 29, 2020
@patrickhulce
Copy link
Collaborator Author

ah sorry this was on me just mislabeled (but I somehow fixed part of it??), suggestions look great 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants