-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make wrapdims more performant #126
Conversation
edit: the original examples were silly. The new examples are better. MWE using AxisKeys, BenchmarkTools, DataFrames, UniqueVectors
X1 = KeyedArray(randn(1000, 1500), a=1:1000, b=1:1500);
X2 = KeyedArray(randn(1100, 1600); a=1:1100, b=1:1600);
df1 = DataFrame(X1);
df2 = DataFrame(X2); On master: julia> @benchmark wrapdims($df1, UniqueVector, :value, :a, :b)
BenchmarkTools.Trial: 3 samples with 1 evaluation.
Range (min … max): 1.776 s … 1.803 s ┊ GC (min … max): 5.47% … 5.47%
Time (median): 1.783 s ┊ GC (median): 5.53%
Time (mean ± σ): 1.787 s ± 13.746 ms ┊ GC (mean ± σ): 5.64% ± 0.29%
█ █ █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
1.78 s Histogram: frequency by time 1.8 s <
Memory estimate: 1.11 GiB, allocs estimate: 43556065.
julia> @benchmark wrapdims($df2, UniqueVector, :value, :a, :b)
BenchmarkTools.Trial: 3 samples with 1 evaluation.
Range (min … max): 2.091 s … 2.120 s ┊ GC (min … max): 5.20% … 6.42%
Time (median): 2.102 s ┊ GC (median): 5.88%
Time (mean ± σ): 2.104 s ± 14.508 ms ┊ GC (mean ± σ): 5.84% ± 0.61%
█ █ █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
2.09 s Histogram: frequency by time 2.12 s <
Memory estimate: 1.31 GiB, allocs estimate: 51940665. On this branch: julia> @benchmark wrapdims($df1, UniqueVector, :value, :a, :b)
BenchmarkTools.Trial: 119 samples with 1 evaluation.
Range (min … max): 39.520 ms … 69.519 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 40.967 ms ┊ GC (median): 0.00%
Time (mean ± σ): 42.018 ms ± 3.384 ms ┊ GC (mean ± σ): 1.08% ± 3.09%
▂█▁
▅▅▆▇███▇▆█▆▆▇▄▄▃▄▁▁▃▄▃▃▃▃▄▃▃▃▁▁▁▃▁▁▁▁▁▃▄▃▃▁▃▁▁▃▃▁▁▁▁▁▁▁▃▁▁▃ ▃
39.5 ms Histogram: frequency by time 50.4 ms <
Memory estimate: 11.88 MiB, allocs estimate: 101.
julia> @benchmark wrapdims($df2, UniqueVector, :value, :a, :b)
BenchmarkTools.Trial: 98 samples with 1 evaluation.
Range (min … max): 47.551 ms … 67.658 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 49.831 ms ┊ GC (median): 0.00%
Time (mean ± σ): 51.422 ms ± 3.819 ms ┊ GC (mean ± σ): 1.21% ± 3.14%
▁▁█ ▁
▄▄▃███▆▁██▄▃▁▄▁▃▁▁▁▁▅▄▆▃▁▄▄▃▁▃▃▁▁▃▁▃▄▁▁▃▁▁▁▃▁▃▁▁▁▁▃▁▁▁▃▁▁▁▃ ▁
47.6 ms Histogram: frequency by time 63.3 ms <
Memory estimate: 13.90 MiB, allocs estimate: 101. In short, this PR makes @rofinn any thoughts on where I should put some allocation-related unit tests? |
Also, I'm seeing the failure on |
Co-authored-by: Eric Davies <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'd just recommend adding a note on why we need the function barrier. If you'd like to add some extra tests regarding memory usage I'd recommend placing them in test/_packages.jl
like before.
https://github.com/mcabbott/AxisKeys.jl/blob/master/test/_packages.jl#L62
NOTE: I also checked and there's also an improvement, though not as severe, for row and column tables 👍🏻
Should I be worried about the integration test at all? |
I don't think so. The errors in AxisSets.jl are unrelated and appear to be a result of a nightly / dependency issue on AxisSets.jl.
|
Resolve performance issues associated with type instabilities in tables without types columns (e.g. DataFrames) by introducing a function barrier.