-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scripts: compare-timings.js --compare #9776
Conversation
console.table(results); | ||
} else if (argv.output === 'json') { | ||
// eslint-disable-next-line no-console | ||
console.log(JSON.stringify(results, null, 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm curious if --output json
will be useful to someone in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say based on DZL that we have so many timings that if you go hunting for deltas you're basically guaranteed to find "interesting" ones on every run, so I kinda prefer old approach of limiting to specific timings you expect to be different, but super handy either way! :)
const baseResults = aggregateResults(argv.name[0]); | ||
const otherResults = aggregateResults(argv.name[1]); | ||
|
||
const keys = [...new Set([...baseResults.map(r => r.key), ...otherResults.map(r => r.key)])]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol was so happy this was <100 lines.
yes, this is a form of the multiple comparisons problem (see also p-hacking and https://xkcd.com/882/) and will lead to apparent changes that after investigation will turn out to have been caused by us confusing the sample for the population. |
fair points everyone, let me retcon this. it's good for when you know you changed something of importance, but don't exactly know what the measure names are :) if you were to see some completely unrelated measure "changed" ... well, don't go hacking that p (did i say that right?) seeing at-a-glance what the deltas are is useful regardless |
Co-Authored-By: Patrick Hulce <[email protected]>
Co-Authored-By: Patrick Hulce <[email protected]>
'mean': mean.description, | ||
'mean Δ': exists(mean.delta) ? round(mean.delta) : undefined, | ||
'stdev': stdev.description, | ||
'stdev Δ': exists(stdev.delta) ? round(stdev.delta) : undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the goal of stddev deltas? If it's when trying to reduce variance then the coefficient of variation is more appropriate to handle when the means are different.
But my biggest issue is this is like quadrupling down on the mean here :) If you look at min|-|mean|-|max for these timings, I think you'll find virtually none of them are symmetric. A real test on the medians (or pick a percentile) or even better, using intervals is the right move here.
It's not blocking, but it is wrong :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am a simple man, I understand means. I can add stdev to the default ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WFM if it's workin' for you!
For a script called
compare-timings.js
, it really didn't do much comparing :pthis is p. cool b/c you don't have to know before hand the specific measures that will change or look at every entry. Just look at the first few to see the biggest changes.