Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts: compare-timings.js --compare #9776

Merged
merged 7 commits into from
Oct 7, 2019
Merged

Conversation

connorjclark
Copy link
Collaborator

For a script called compare-timings.js, it really didn't do much comparing :p

image

this is p. cool b/c you don't have to know before hand the specific measures that will change or look at every entry. Just look at the first few to see the biggest changes.

console.table(results);
} else if (argv.output === 'json') {
// eslint-disable-next-line no-console
console.log(JSON.stringify(results, null, 2));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm curious if --output json will be useful to someone in the future

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say based on DZL that we have so many timings that if you go hunting for deltas you're basically guaranteed to find "interesting" ones on every run, so I kinda prefer old approach of limiting to specific timings you expect to be different, but super handy either way! :)

lighthouse-core/scripts/compare-timings.js Outdated Show resolved Hide resolved
const baseResults = aggregateResults(argv.name[0]);
const otherResults = aggregateResults(argv.name[1]);

const keys = [...new Set([...baseResults.map(r => r.key), ...otherResults.map(r => r.key)])];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol was so happy this was <100 lines.

lighthouse-core/scripts/compare-timings.js Outdated Show resolved Hide resolved
lighthouse-core/scripts/compare-timings.js Show resolved Hide resolved
lighthouse-core/scripts/compare-timings.js Outdated Show resolved Hide resolved
lighthouse-core/scripts/compare-timings.js Outdated Show resolved Hide resolved
@brendankenny
Copy link
Member

this is p. cool b/c you don't have to know before hand the specific measures that will change or look at every entry. Just look at the first few to see the biggest changes.

I will say based on DZL that we have so many timings that if you go hunting for deltas you're basically guaranteed to find "interesting" ones on every run, so I kinda prefer old approach of limiting to specific timings you expect to be different

yes, this is a form of the multiple comparisons problem (see also p-hacking and https://xkcd.com/882/) and will lead to apparent changes that after investigation will turn out to have been caused by us confusing the sample for the population.

@connorjclark
Copy link
Collaborator Author

this is p. cool b/c you don't have to know before hand the specific measures that will change or look at every entry. Just look at the first few to see the biggest changes.

fair points everyone, let me retcon this. it's good for when you know you changed something of importance, but don't exactly know what the measure names are :) if you were to see some completely unrelated measure "changed" ... well, don't go hacking that p (did i say that right?)

seeing at-a-glance what the deltas are is useful regardless

'mean': mean.description,
'mean Δ': exists(mean.delta) ? round(mean.delta) : undefined,
'stdev': stdev.description,
'stdev Δ': exists(stdev.delta) ? round(stdev.delta) : undefined,
Copy link
Member

@brendankenny brendankenny Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the goal of stddev deltas? If it's when trying to reduce variance then the coefficient of variation is more appropriate to handle when the means are different.

But my biggest issue is this is like quadrupling down on the mean here :) If you look at min|-|mean|-|max for these timings, I think you'll find virtually none of them are symmetric. A real test on the medians (or pick a percentile) or even better, using intervals is the right move here.

It's not blocking, but it is wrong :P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am a simple man, I understand means. I can add stdev to the default ignore.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WFM if it's workin' for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants