Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Multiple calls to @snoopi_deep #222

Closed
tlienart opened this issue Jan 17, 2021 · 17 comments
Closed

[question] Multiple calls to @snoopi_deep #222

tlienart opened this issue Jan 17, 2021 · 17 comments

Comments

@tlienart
Copy link

Hello,

I was trying out a few things in a project to reduce the number of inference triggers when I realised that if I ran @snoopi_deep after restarting Julia, I got a huge number of triggers whereas after that I only got 4, all from Base. Which result should I consider?

Also is it possible to filter the inference triggers by module?

Thanks!

@timholy
Copy link
Owner

timholy commented Jan 19, 2021

Sorry to be slow to respond, #225 has been taking a lot of time.

You generally want to run @snoopi_deep in a fresh session. Once you infer a method, you generally don't need to reinfer it. (Redefining the method of course forces it to be reinferred.)

As far as filtering goes, for @snoopr-measured invalidations, you can use filtermod. That only filters on the root; we might want something that "hits" anything in a given module?

For @snoopi_deep inference triggers, should be able to do something like

filter(itrig -> itrig.callerframes[end].linfo.def.module === Franklin, itrigs)

That should probably become filtermod on a list of itrigs.

@tlienart
Copy link
Author

Ah crap then I have 160 of those :/ (this is for another, much smaller module, which will eventually replace Franklin's parser: FranklinParser).

I think once I get the hang of reducing those, I might try to make a PR to extend the docs :)

Thanks again Tim!

@timholy
Copy link
Owner

timholy commented Jan 19, 2021

It's pretty in flux right now. suggest can triage them to help eliminate ones you don't need to address, or that might need to be addressed elsewhere (e.g., Julia itself). For example, my investigations have led me to tackle the concatenation pipeline, and led to these PRs:

and the net effect is an improvement in both runtime performance and precompilability.

I will play with FranklinParser at some point and see if I can help provide some context and/or additional tools. I recognize the complexity of these tools and am working to make them accessible to a wide audience, and you're very much in that target audience. So complain if it's unclear or too much work and I'll see what I can do to decrease the effort.

For now, the invalidations should be a much narrower set than all inference failures. In the long run, it may be easier to identify problems from the inference failures, but as you say right now the list of inference failures can be disturbingly long.

@timholy
Copy link
Owner

timholy commented Jan 19, 2021

The main thing I'd recommend is to check whether you've tried the profile-guided despecialization GUI yet. https://timholy.github.io/SnoopCompile.jl/dev/pgdsgui/
I find it's a nice way to get a sense for the latency cost of design decisions.

@tlienart
Copy link
Author

Thanks a lot Tim! There's one very clear benefit of all this is that it made me think "wait, does the compiler know what that thing is or is it only known at runtime" and while this is painful in some cases, I think overall it might push me and others to just do things better (eg I now stumbled upon the section in the Julia sections in the docs warning against abusing the despatch system, something I was certainly doing).

I'll keep trying and will provide specific feedback once I'm a bit more comfortable with these tools!

@timholy
Copy link
Owner

timholy commented Jan 20, 2021

Very good. You don't have to optimize everything; if "abusing dispatch" makes your code cleaner, and you don't really care about the runtime or latency hit, there's no problem with it. But if you are trying to make lean, low-latency, performant code, then yes, these things matter.

@tlienart
Copy link
Author

But if you are trying to make lean, low-latency, performant code, then yes, these things matter.

Yes well ultimately I'd like time to first serve in Franklin to be as low as possible, so if there are low hanging fruits to do so (and if it makes me a better Julia programmer along the way) I'll do it!

@timholy
Copy link
Owner

timholy commented Jan 20, 2021

👍 I see you still have a lot of invalidations. Fixing those is probably the highest payoff. To make your life easier I just pushed #226. I'll post a demo in tlienart/Franklin.jl#752.

@tlienart
Copy link
Author

Thanks so much!

@tlienart
Copy link
Author

I went from 160 to 9 inference triggers in FranklinParser 😅 , the remaining ones are a bit hard to track down but the situation is likely significantly better than it was before! thanks again, this is an amazing package.

@timholy
Copy link
Owner

timholy commented Jan 24, 2021

Woot! That's awesome news. Great work!

SnoopCompile 2.5.0 just came out a couple of moments ago. It has quite a few changes to the inference triggers analysis code. In particular, for #222 (comment) the new aggregation of triggers by method, plus the ability to filter, may help reduce the daunting number.

@tlienart
Copy link
Author

tlienart commented Jan 25, 2021

It's now down to zero 😅 and I think the code is all the better for it.

One small note: to "remove" the final one, I had to inline a function that I used to create a const dictionary used elsewhere something like

const RULES = LittleDict{...}(
  key1 => method(params1),
  key2 => method(params2),
  ...
)

The suggestion from SnoopCompile was to try precompiling the function. It's not immediately clear to me why the inlining worked but maybe that could be an extra example in the docs?

@timholy
Copy link
Owner

timholy commented Jan 25, 2021

That sounds good. Can you point me somewhere so I can understand the changes in greater detail?

@tlienart
Copy link
Author

tlienart commented Jan 25, 2021

https://github.com/tlienart/FranklinParser.jl/tree/validator

Some of the changes are pretty major (the parsing before was running with "oracle functions" that were encapsulating a condition for a section of string to meet, now this is done with concrete objects).

this is the inline: https://github.com/tlienart/FranklinParser.jl/blob/07d2ecd8de9ebc0b135d586fbf8d2da03c5695a7/src/tokens/utils.jl#L102-L117

the function forward_match is only used as a way to define rules like here: https://github.com/tlienart/FranklinParser.jl/blob/07d2ecd8de9ebc0b135d586fbf8d2da03c5695a7/src/tokens/markdown_tokens.jl#L8-L18

this: https://github.com/tlienart/FranklinParser.jl/blob/validator/src/tokens/find_tokens.jl is the function I ended up significantly refactoring in view of the output from SnoopCompile.

if you compare it with https://github.com/tlienart/FranklinParser.jl/blob/main/src/tokens/find_tokens.jl you'll see some, probably obvious, elements:

  • making TokenFinder a proper object
  • refactoring of the validation of candidates (instead of using functions as fields of the "TokenFinder", use objects that do the validation).

I also got rid of a few NTuple{N, Char} where N to replace by Vector{Char}, looks like inference wise this is better?

@timholy
Copy link
Owner

timholy commented Jan 26, 2021

Thanks!

So I presume you're referring to this:

julia> itrig = ftrigs[1]
Inference triggered to call MethodInstance for forward_match(::String, ::Vector{Char}, ::Bool) from forward_match (/home/tim/.julia/dev/FranklinParser/src/tokens/utils.jl:108) inlined into MethodInstance for forward_match(::String) (/home/tim/.julia/dev/FranklinParser/src/tokens/utils.jl:108)

julia> suggest(itrig)
/home/tim/.julia/dev/FranklinParser/src/tokens/utils.jl:108: invoked callee (forward_match(refstring::String) at utils.jl:108 may fail to precompile)

(I removed the @inline). The bottom line is that I think that invoked callee triggers probably reflect a bug in Julia. I've noticed that methods that supply default arguments sometimes fail to precompile. xref JuliaLang/julia#38983.

It's quite interesting that inlining circumvents this problem, but I'm not sure I want to recommend that in general---forced inlining can also increase latency since it prevents you from just exploiting already-compiled code. Some methods that don't precompile are so huge that forced inlining would be a bad idea.

On balance, I think we just need to fix this bug. Fortunately, in your case the cost is cheap:

julia> itrig.node
InferenceTimingNode: 0.001454/0.002313 on InferenceFrameInfo for FranklinParser.forward_match(::String, ::Vector{Char}, ::Bool) with 5 direct children

You probably won't really notice that 2.3ms very much. There are circumstances where this is more problematic than that (e.g., JuliaLang/julia#38983 (comment) as well as that JuliaInterpreter.step_expr! method).

the function I ended up significantly refactoring

Yes, that looks like a good move from an inferrability/latency perspective. There are cases where it's valid to do it the way you were doing it previously, but if that λ was getting called many times on short jobs, I suspect your change may also improve runtime performance. If you find that's not true (regex parsing can be slow), and if runtime performance matters more than latency, then of course you could go back to the λ approach. If FranklinParser defines all the functions that will ever be called from find_tokens, you can add precompile directives to at least make sure they're all ready to go. (You only get an inference trigger if it's not already inferred.) You'd still have to pay the cost of runtime dispatch, but that sometimes may be smaller than running a regex parser. (And sometimes, not.)

I also got rid of a few NTuple{N, Char} where N to replace by Vector{Char}, looks like inference wise this is better?

If N is in a small list the former would be ok, e.g., making it Union{Tuple{Char}, Tuple{Char,Char}} rather than NTuple{N,Char} would probably give you better runtime performance than Vector{Char}and no extra cost in latency. But if N is not bounded, or even if it ranges from, say, 1-10, you're probably better off with the Vector{Char}. If you really have a performance bottleneck and you want to use the tuple, a "manual union splitting" block like

tf = if set isa NTuple{1,Char}
    c  set
elseif set isa NTuple{2,Char}
    c  set
elseif set isa NTuple{3,Char}
    c  set
elseif set isa NTuple{4,Char}
    c  set
...
end

can circumvent the inference problem from using NTuple{N,Char}. As long as the types listed after the isa are concrete types, the type-check is just a pointer comparison (i.e., very fast) and then the c ∈ set gets specialized for each length of tuple. That block is the same thing that would be created by Julia's automatic union-splitting, but that only happens for unions with 3 or fewer types, whereas with the manual approach you can choose how many cases to write out.

Again, these are ugly and should only be reserved for something that is very "hot" performance-wise. But it's nice that Julia makes this kind of code possible when it's really important.

@tlienart
Copy link
Author

Thanks a lot Tim! this all makes sense to me. I think in terms of FranklinParser I probably won't try to optimise it much more as there's a bunch of other low hanging fruits to fix in Franklin itself first and runtime performance right now is already high enough (standard pages build in ~5-10ms). But I've enjoyed learning about all this a lot and will definitely keep an eye on SnoopCompile and its evolutions as it really helps be aware of some peculiarities that are good to be kept in mind 😄

Again thanks a lot for all your advices and recommendations, I learned a lot!

@timholy
Copy link
Owner

timholy commented Jan 26, 2021

Yeah, from looking at FranklinParser briefly I concluded it was good to go. Hopefully this adventure will bring you many long-term benefits!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants