-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps: switch third-party-web dataset to entities-nostats subset #14548
Conversation
for reference #9067 all that I could find written down about this:
|
edit: nice 🕵️ @connorjclark :) |
At 0.20.2 version, difference is: ls -ls *nostats.json | awk '{print $6 " " $10}'
111003 entities-httparchive-nostats.json
201320 entities-nostats.json Where we're now (0.17.1), difference is: ls -ls *nostats.json | awk '{print $6 " " $10}'
106566 entities-httparchive-nostats.json
206842 entities-nostats.json |
The I think LH should switch to and depend on the full data set for maximizing coverage. Would the additional (double) size of plaintext payload be a concern? |
It's a ~6% increase to devtools bundle, so not too much. |
How big is the dt bundle these days? 6% of big can still be pretty big :) It might be worth looking into the helpfulness per byte added. I assume there are power law distributions at work here, e.g. if 1/1000 Lighthouse runs get a single additional entity recognized, it might not be worth it. 1/100, maybe? For the sample_v2 test problem: there are a few requests to Maybe it is no big deal even if there's diminishing returns, but something to think about. For the actual call, I'm not sure who the dt bundle size watcher is these days (or if there is one). |
also is that 6% has to be pre-compression? I assume all those domain lists compress really well. |
It's pre-compression, because I am not familiar with how exactly the devtools build / chrome distribution is compressed. if we can just assume something like gzip... it's 427kb vs 445kb |
Yep that's right! I would expect there to be diminishing returns here but, as @brendankenny points out, we definitely bias against 3Ps that are only really triggered on interaction so for fraggle rock perspective it might be worth including more. Happy to craft more bespoke cuts of the dataset to publish if there's a different set of tradeoffs that would benefit LH/DT these days :) |
Seems OK to just include all the data if this is what we're looking at 🤷 |
I was tracking down why a recent upgrade of the library was losing one of the recognized entities, and in the process realized we aren't getting several entities that are in
entities.json
.With 0.20.2, it outputs:
The entity moved out of the HTTPArchive dataset (as expected since it went below the threshold requirements to make it to that dataset).
Digging further: