WIP: statistical allocation profiling #31915

tkluck · 2019-05-03T15:08:21Z

Julia has a mature statistical profiler. It sets a timer that captures a backtrace when it is triggered. By the law of large numbers, this gives insight into where an algorithm spends its time, without noticably slowing the program down.

By comparison, finding out where the allocations are happening is quite a bit more cumbersome. It needs starting Julia with a specific command line switch, code execution is much slower, and after program exit, the results are scattered over the file system.

This pull request represents an attempt at bringing the ergonomics of statistical runtime profiling to allocations: "statistical allocation profiling". Similar to how, in the former case, Profile.init configures a delay between backtraces, this branch add an option to specify a fraction of allocations that capture a backtrace.

Example usage:

using Profile
Profile.init(alloc_rate = 0.01)

doublefibonacci(n) = if n <= 2
    return [1, 1]
else
    return doublefibonacci(n - 1) .+ doublefibonacci(n - 2)
end
@profile for i=1:1000; doublefibonacci(15); end

Profile.print() # but better to use e.g. ProfileView or StatProfilerHTML

State of this commit:

linux support only
not thread-safe
no attempt at a friendly human interface; as it is, the Profile.init API almost encourages a linear combination of runtime and allocation profiling. That makes no sense at all.

I'm sending this as a WIP early so I can get feedback before investing time in productionizing this. What do you think?

@Profile

Julia has a mature statistical profiler. It sets a timer that captures a backtrace when it is triggered. By the law of large numbers, this gives insight into where an algorithm spends its time, without noticably slowing the program down. By comparison, finding out where the allocations are happening is quite bit more cumbersome. It needs starting Julia with a specific command line switch, code execution is _much_ slower, and after program exit, the results are scattered over the file system. This pull request represents an attempt at bringing the ergonomics of statistical _runtime_ profiling to allocations: "statistical allocation profiling". Similar to how, in the former case, `Profile.init` configures a delay between backtraces, this branch add an option to specify a fraction of allocations that capture a backtrace. Example usage: ```julia using Profile Profile.init(alloc_rate = 0.01) doublefibonacci(n) = if n <= 2 return [1, 1] else return doublefibonacci(n - 1) .+ doublefibonacci(n - 2) end @Profile for i=1:1000; doublefibonacci(15); end Profile.print() # but better to use e.g. ProfileView or StatProfilerHTML ``` State of this commit: - linux support only - not thread-safe - no attempt at a friendly human interface; as it is, the `Profile.init` API almost encourages a linear combination of runtime and allocation profiling. That makes no sense at all.

vtjnash · 2019-05-03T15:25:45Z

See also #31534 (I haven't yet looked into either much to compare)

tkluck · 2019-05-03T15:31:54Z

@vtjnash thanks for the reference! I wasn't aware of that one.

From skimming the other PR, it looks like the main differences are:

@staticfloat's PR makes separate buffers for memory profiling. That's allows for getting some extra specifics but also means that much of the surrounding tooling needs to be adapted (e.g. can't just use ProfileView as-is)
@staticfloat's PR has no statistical component; it just tracks everything.
@staticfloat's PR has rich filtering options for different kinds of allocations
@staticfloat's PR also upgrades the Profile package with a friendly human interface representing this new way of profiling.

yuyichao · 2019-05-03T15:33:48Z

Why is this not just a display feature? The profile already contain backtraces that includes the allocation function. The only job should be to find those functions in the backtrace and it should not involve changing allocation code.

tkluck · 2019-05-03T15:37:34Z

Why is this not just a display feature? The profile already contain backtraces that includes the allocation function. The only job should be to find those functions in the backtrace and it should not involve changing allocation code.

Because that's scaled by time spent, not by number of allocations. It's the latter thing that's the objective of this PR.

This was an oversight in the previous commit.

timholy · 2019-05-04T09:08:36Z

Interesting. I like the tunable runtime overhead. While number of allocations is probably what I'd use this for most, sometimes one might want more info about the size of allocations. Using this approach, could one indirectly get that via an option to trigger every n bytes? (Or once the next rand()*n bytes get allocated, if you're worried about periodic phenomena.)

One option worth considering is to collaborate with @staticfloat to finish #31534, and perhaps integrate the tunable runtime overhead of this approach.

chethega · 2019-05-04T14:29:27Z

src/gc.c

@@ -1108,6 +1116,8 @@ JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc(jl_ptls_t ptls, int pool_offset,
        jl_gc_safepoint_(ptls);
    }
    gc_num.poolalloc++;
+    if(gc_statprofile_sample_rate && rand() < gc_statprofile_sample_rate)


Following @timholy's comment on adjustible overhead: This implementation calls the RNG on every alloc. Hence, even if the sample rate is close to zero, the overhead does not converge to zero.

An alternative would be something like
if(gc_num.poolalloc++ == gc_num.next_pool_sample) {gc_num.next_pool_sample += gc_statprofile_pool_inverse_rate; jl_profile_record_trace(NULL);}.

With gc_num.next_pool_sample = 0, this would trigger on next wrap-around, i.e. never, and with gc_statprofile_pool_inverse_rate large this would trigger very rarely. We would pay only a single predicted branch on allocs we don't want to sample.

Similar treatment could be applied to gc_num.bigalloc, gc_num.allocd, etc counters. We probably should randomize the increment in order to avoid biases in loops that have period close to commensurable with the inverse rate. While poisson distribution of the gaps (as your code provides) is statistically nicer, something like 1 + (inverse_rate * rand_uint16()) >> 15 is probably good enough.

That's a great point. I'll run some timings to see how RNG overhead compares to the allocation itself. If it's significant, I'll investigate the right scheme to use here. If not, there's probably value in keeping Poisson.

tkluck · 2019-05-04T23:41:12Z

@timholy thanks for the comments. I'll be glad to work together on combining these pull requests. @staticfloat what do you think?

timholy · 2019-05-13T16:44:01Z

@tkluck, thanks again for this. It was extremely useful in JuliaImages/ImageFiltering.jl#94 (comment); highly recommended for anyone else who wants to debug something similar. I am looking forward to whatever form this ends up taking!

Sacha0 · 2022-10-20T19:29:30Z

Superseded by #42768? :)

Profile.init: pass keyword parameter onwards

8aa4737

This was an oversight in the previous commit.

chethega reviewed May 4, 2019

View reviewed changes

timholy mentioned this pull request May 13, 2019

Feature request: Specify a tuple of padding styles JuliaImages/ImageFiltering.jl#94

Closed

timholy mentioned this pull request Sep 8, 2019

Consider adding compatibility with Massif - Valgrind's Heap profiler timholy/ProfileView.jl#117

Open

oscardssmith closed this Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: statistical allocation profiling #31915

WIP: statistical allocation profiling #31915

tkluck commented May 3, 2019 •

edited

Loading

vtjnash commented May 3, 2019

tkluck commented May 3, 2019 •

edited

Loading

yuyichao commented May 3, 2019

tkluck commented May 3, 2019

timholy commented May 4, 2019

chethega May 4, 2019 •

edited

Loading

tkluck May 4, 2019

tkluck commented May 4, 2019

timholy commented May 13, 2019

Sacha0 commented Oct 20, 2022

WIP: statistical allocation profiling #31915

WIP: statistical allocation profiling #31915

Conversation

tkluck commented May 3, 2019 • edited Loading

vtjnash commented May 3, 2019

tkluck commented May 3, 2019 • edited Loading

yuyichao commented May 3, 2019

tkluck commented May 3, 2019

timholy commented May 4, 2019

chethega May 4, 2019 • edited Loading

Choose a reason for hiding this comment

tkluck May 4, 2019

Choose a reason for hiding this comment

tkluck commented May 4, 2019

timholy commented May 13, 2019

Sacha0 commented Oct 20, 2022

tkluck commented May 3, 2019 •

edited

Loading

tkluck commented May 3, 2019 •

edited

Loading

chethega May 4, 2019 •

edited

Loading