-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redefine EventNetwork
to reduce allocations
#238
Conversation
EventNetwork
to reduce allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong with a bit of defunctionalization for the sake of performance! 😊
To make sure that the small performance improvements are real: What happens if you use the new code as baseline and benchmark the old code against this baseline? Would it also report 1%-4% improvements due to statistical noise?
Co-authored-by: Heinrich Apfelmus <[email protected]>
I switched machines (to my more powerful desktop) so the numbers don't quite match, but here's what I get if I use this PR as the baseline, and switch back to
|
Also just want to add I tried running benchmarks on this PR once to get a baseline, and again against the baseline - the second run doesn't show any significant changes (that is, matches the previous run, so we're not just observing measurement error). |
Awesome, thanks for the diligence! 😊 |
The goal of this change is to reduce an allocation that occurs when firing the event network via with
fromAddHandler
. If we inspect the current STG, we see:sat_sydH
is the function that is used toregister
against theAddHandler
to fire the network. If we look at this in more detail, we see that after pattern matching on anEventNetwork
, we then allocate aStep
to pass tods_sydD
-ds_sydD
is therunStep
function.What we'd really like is to inline
runStep
, but because it's part ofEventNetwork
, this is very difficult. Fortunately,runStep
can be lifted out to the top-level, and the free variables pulled out and moved intoEventNetwork
. This breaks a bit of encapsulation (though we could recover this withmodule
).The impact of this change is noticable. I first ran the benchmarks with
cabal run benchmarks -- --stdev 1 --csv baseline.csv
. Next, I applied the changes in this PR, and re-rancabal run benchmarks -- --stdev 1 --baseline baseline.csv
. The results are:The most striking difference is "Boring". This is a "no-op" benchmark, which is now 30% faster (!), and also apparently allocates 0 bytes (though I don't entirely believe this, I think it's measurement error).