-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duplicate bessel function definitions #178
Comments
There seems to be no performance boost for the MPFR: julia> x=10; @time ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
0.000070 seconds (12 allocations: 394 bytes)
1
julia> n=0; x=10; @time ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
0.000070 seconds (12 allocations: 394 bytes) |
I looked around and most other software also does not distinguish between orders 0,1,n |
using BenchmarkTools one can see that the BigFloat implementation is of the same speed whereas the Float64 implementation is twice as fast when using julia> n=0; x=10; @benchmark ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
BenchmarkTools.Trial:
memory estimate: 394 bytes
allocs estimate: 12
--------------
minimum time: 30.922 μs (0.00% GC)
median time: 31.226 μs (0.00% GC)
mean time: 31.518 μs (0.00% GC)
maximum time: 94.123 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> x=10; @benchmark ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
BenchmarkTools.Trial:
memory estimate: 394 bytes
allocs estimate: 12
--------------
minimum time: 30.202 μs (0.00% GC)
median time: 30.553 μs (0.00% GC)
mean time: 30.859 μs (0.00% GC)
maximum time: 118.430 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> x=10; @benchmark ccall(("j0",libm), Float64, (Float64,), x)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 625.947 ns (0.00% GC)
median time: 642.450 ns (0.00% GC)
mean time: 656.163 ns (1.80% GC)
maximum time: 118.823 μs (99.40% GC)
--------------
samples: 10000
evals/sample: 171
julia> nu=0; x=10; @benchmark ccall(("jn", libm), Float64, (Cint, Float64), nu, x)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 1.128 μs (0.00% GC)
median time: 1.168 μs (0.00% GC)
mean time: 1.179 μs (0.00% GC)
maximum time: 4.927 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10 What about only offering the interface |
Agreed. Could you open a PR? |
I'd probably recommend keeping the I'm not sure if a microbenchmark like this will catch the performance gain because of the branch prediction (or how that propagates from the C library?). Almost all of these algorithms default to specialized calls for Though still in development it is faster to avoid the branches in Bessels.jl from the generic julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max): 9.425 ns … 29.029 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.510 ns ┊ GC (median): 0.00%
Time (mean ± σ): 9.532 ns ± 0.412 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄ █ ▅ ▃ ▂ ▁
▇▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▆▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▅ █
9.43 ns Histogram: log(frequency) by time 9.84 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Bessels.besselj1(x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 6.375 ns … 29.542 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.459 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.551 ns ± 0.796 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃ █ ▇ ▄ ▂ ▂ ▄ ▁ ▁ ▂
▆▁▁█▁▁█▁▁█▁▁█▁▁▁█▁▁▆▁▁█▁▁█▁▁█▁▁▁█▁▁▇▁▁▆▁▁█▁▁▁█▁▁▇▁▁▇▁▁▇▁▁▆ █
6.38 ns Histogram: log(frequency) by time 7.12 ns <
Memory estimate: 0 bytes, allocs estimate: 0. |
It's a bit surprising to me that the branch is so significant in cost compared to the Bessel computation… Couldn't we set up the |
I was honestly a bit surprised too. I will probably open an issue over at Bessels.jl to continue this discussion there because this 3-4 ns difference holds across all the input ranges... julia> @benchmark Bessels.besselj1(x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
Range (min … max): 20.436 ns … 86.552 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 20.855 ns ┊ GC (median): 0.00%
Time (mean ± σ): 20.943 ns ± 1.397 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█▇▃▁█▇▄ ▁▃ ▁▃ ▂
████████▇███▇████▆▇██▇▆▆█▆▅▆▃▁▄▁▁▁▃▁▁▃▁▄▁▁▁▁▁▃▃▄▄▁▄▃▃▃▄▄▄▃▄ █
20.4 ns Histogram: log(frequency) by time 25.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
Range (min … max): 24.598 ns … 53.087 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 25.268 ns ┊ GC (median): 0.00%
Time (mean ± σ): 25.216 ns ± 1.259 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█ ▃▆
▂██▃▂▂▁▁▃██▆▄▂▂▂▂▁▁▂▂▂▂▂▂▁▂▁▂▁▂▁▂▁▁▁▁▁▂▂▂▂▂▁▁▁▂▁▂▂▂▁▂▂▂▁▂▁▂ ▃
24.6 ns Histogram: frequency by time 28.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0. I'm hoping this has something to do with actively working on the generic implementation for arbitrary nu so benchmarking might be a little off for now. Though I would think this function would be able to specialize better for function _besselj(nu, x)
nu == 0 && return besselj0(x)
nu == 1 && return besselj1(x)
if x < 20.0
if nu > 60.0
return log_besselj_small_arguments_orders(nu, x)
else
return besselj_small_arguments_orders(nu, x)
end
elseif 2*x < nu
return besselj_debye(nu, x)
elseif x > nu
return besselj_large_argument(nu, x)
else
return nothing
end
end |
What is the reason why there are duplicate calls to bessel functions available?
For instance
Isnt that an implementation detail that the library user should not care about?
The text was updated successfully, but these errors were encountered: