Instrumentation markers for logging inference? #47902
Labels
compiler:inference
Type inference
compiler:latency
Compiler latency
julep
Julia Enhancement Proposal
TL;DR
The drive to minimize TTFX may benefit from the ability to send "instrumenting commands" to inference directly from Julia code.
For those who may remember a discussion (somehow I'm not finding it...) about the order in which compilation and timer instrumentation occur in
@time
, this has the same flavor.Background
With native code caching (#47184), SnoopPrecompile, and the ability to fix invalidations, in principle we should often be able to drive "excess" TTFX (that beyond the workload itself) nearly to zero. So far, we're only rarely so successful: for example, #47889 seems to eliminate all invalidations that affect the demonstrated workloads, and yet the TTFX is more than a second for each. (It's milliseconds or less on the second execution of the workload.) While one can be happy about the gains we have, I was initially puzzled about why it's not at least an order of magnitude better.
Analysis of causes
From what I can tell, the answer appears to be a subtle interaction between the interpreter, scope, and tracking of inference. Here's the basic idea: currently SnoopPrecompile attempts to take a precompilation directive like this (a lightly modified version of CSV.jl's actual precompile file):
and turn it into something like this:
Seems sensible, right? But the problem is this: the
@force_compile
seems to end up forcing compilation of the entire contents of thelet
block, and that means some of the items get inferred before we turn on inference logging. That's OK for methods owned by the package, but if not it results in omission from the pkgimage. For this example, one of ~16 things that don't get cached ispkgdir(::Module)
; of course you can argue that I should have put that in the setup code, and that its omission is actually a good thing (I wouldn't disagree). But the more important point is that@precompile_all_calls
is supposed to guarantee caching of everything inside, and it's definitely not living up to that promise; in some cases that hurts TTFX in undesirable ways.Possible solutions
Separate toplevel expressions
If we didn't have to worry about setup code and hiding temporary variables inside the
let
, we could just have top-level expressions likeBut I think the ability to not pollute the module namespace and to have the option of not force-caching setup code is pretty desirable.
Use of
@eval
Perhaps the next easiest is to turn the user workload into
but the need to escape the
PRECOMPILE_DATA
is annoying (presumably, package devs would have to do that manually, and it seems error-prone).Instrumenting inference directly from user code
A final alternative is for us to have some special expressions, kind of like
:meta
works now, that essentially implements this logic within inference itself: that is, we basically send an expression to inference that toggles logging. In other words, it might translate into something likeand
Expr(:meta, :track_inference, expr_workload)
is just a marker that gets processed during inference and triggers it to start logging before inferringexpr_workload
.The text was updated successfully, but these errors were encountered: