duplicate bessel function definitions #178

PaulXiCao · 2019-08-20T20:25:06Z

What is the reason why there are duplicate calls to bessel functions available?
For instance

besselj(nu,z)
besselj0(x)
besselj1(x)

Isnt that an implementation detail that the library user should not care about?

PaulXiCao · 2019-08-25T10:24:22Z

There seems to be no performance boost for the MPFR:

julia> x=10; @time ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
  0.000070 seconds (12 allocations: 394 bytes)
1

julia> n=0; x=10; @time ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
  0.000070 seconds (12 allocations: 394 bytes)

PaulXiCao · 2019-08-25T10:28:34Z

I looked around and most other software also does not distinguish between orders 0,1,n

PaulXiCao · 2019-08-25T10:55:42Z

using BenchmarkTools one can see that the BigFloat implementation is of the same speed whereas the Float64 implementation is twice as fast when using besselj0(x) instead of besselj(0,x)

julia> n=0; x=10; @benchmark ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
BenchmarkTools.Trial: 
  memory estimate:  394 bytes
  allocs estimate:  12
  --------------
  minimum time:     30.922 μs (0.00% GC)
  median time:      31.226 μs (0.00% GC)
  mean time:        31.518 μs (0.00% GC)
  maximum time:     94.123 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> x=10; @benchmark ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
BenchmarkTools.Trial: 
  memory estimate:  394 bytes
  allocs estimate:  12
  --------------
  minimum time:     30.202 μs (0.00% GC)
  median time:      30.553 μs (0.00% GC)
  mean time:        30.859 μs (0.00% GC)
  maximum time:     118.430 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> x=10; @benchmark ccall(("j0",libm),  Float64, (Float64,), x)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     625.947 ns (0.00% GC)
  median time:      642.450 ns (0.00% GC)
  mean time:        656.163 ns (1.80% GC)
  maximum time:     118.823 μs (99.40% GC)
  --------------
  samples:          10000
  evals/sample:     171

julia> nu=0; x=10; @benchmark ccall(("jn", libm), Float64, (Cint, Float64), nu, x)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     1.128 μs (0.00% GC)
  median time:      1.168 μs (0.00% GC)
  mean time:        1.179 μs (0.00% GC)
  maximum time:     4.927 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

What about only offering the interface besselj(n,x) but making special calls for n=0,1 and other?

musm · 2020-03-11T02:38:00Z

Agreed.

Could you open a PR?

heltonmc · 2022-05-16T17:46:54Z

I'd probably recommend keeping the besselj0 and friends syntax. Scipy actually does differentiate this and these functions are some of the most used.

I'm not sure if a microbenchmark like this will catch the performance gain because of the branch prediction (or how that propagates from the C library?). Almost all of these algorithms default to specialized calls for nu=0,1 so it probably is good to separate them.

Though still in development it is faster to avoid the branches in Bessels.jl from the generic besselj call....

julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  9.425 ns … 29.029 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.510 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.532 ns ±  0.412 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▄     █     ▅     ▃     ▂                             ▁
  ▇▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▆▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▅ █
  9.43 ns      Histogram: log(frequency) by time     9.84 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels.besselj1(x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  6.375 ns … 29.542 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.459 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.551 ns ±  0.796 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃  █  ▇  ▄   ▂     ▂  ▄  ▁                ▁             ▂
  ▆▁▁█▁▁█▁▁█▁▁█▁▁▁█▁▁▆▁▁█▁▁█▁▁█▁▁▁█▁▁▇▁▁▆▁▁█▁▁▁█▁▁▇▁▁▇▁▁▇▁▁▆ █
  6.38 ns      Histogram: log(frequency) by time     7.12 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

stevengj · 2022-05-16T20:38:46Z

It's a bit surprising to me that the branch is so significant in cost compared to the Bessel computation…

Couldn't we set up the besselj(n,x) routine so that constant-propagation can eliminate the n==0 and n==1 checks when n is a literal 0 or 1?

heltonmc · 2022-05-16T20:49:38Z

I was honestly a bit surprised too. I will probably open an issue over at Bessels.jl to continue this discussion there because this 3-4 ns difference holds across all the input ranges...

julia> @benchmark Bessels.besselj1(x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  20.436 ns … 86.552 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     20.855 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.943 ns ±  1.397 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▇▃▁█▇▄ ▁▃   ▁▃                                            ▂
  ████████▇███▇████▆▇██▇▆▆█▆▅▆▃▁▄▁▁▁▃▁▁▃▁▄▁▁▁▁▁▃▃▄▄▁▄▃▃▃▄▄▄▃▄ █
  20.4 ns      Histogram: log(frequency) by time      25.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  24.598 ns … 53.087 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.268 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.216 ns ±  1.259 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃█      ▃▆                                                  
  ▂██▃▂▂▁▁▃██▆▄▂▂▂▂▁▁▂▂▂▂▂▂▁▂▁▂▁▂▁▂▁▁▁▁▁▂▂▂▂▂▁▁▁▂▁▂▂▂▁▂▂▂▁▂▁▂ ▃
  24.6 ns         Histogram: frequency by time        28.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I'm hoping this has something to do with actively working on the generic implementation for arbitrary nu so benchmarking might be a little off for now.

Though I would think this function would be able to specialize better for nu=0,1?

function _besselj(nu, x)
    nu == 0 && return besselj0(x)
    nu == 1 && return besselj1(x)
    if x < 20.0
        if nu > 60.0
            return log_besselj_small_arguments_orders(nu, x)
        else
            return besselj_small_arguments_orders(nu, x)
        end
    elseif 2*x < nu
        return besselj_debye(nu, x)
    elseif x > nu
        return besselj_large_argument(nu, x)
    else
        return nothing
    end
end

heltonmc mentioned this issue May 16, 2022

besselj1(x) is faster than besselj(1, x) JuliaMath/Bessels.jl#21

Closed

heltonmc mentioned this issue Jul 26, 2022

Half-integer order for besselk JuliaMath/Bessels.jl#25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duplicate bessel function definitions #178

duplicate bessel function definitions #178

PaulXiCao commented Aug 20, 2019

PaulXiCao commented Aug 25, 2019

PaulXiCao commented Aug 25, 2019

PaulXiCao commented Aug 25, 2019

musm commented Mar 11, 2020

heltonmc commented May 16, 2022 •

edited

Loading

stevengj commented May 16, 2022

heltonmc commented May 16, 2022

duplicate bessel function definitions #178

duplicate bessel function definitions #178

Comments

PaulXiCao commented Aug 20, 2019

PaulXiCao commented Aug 25, 2019

PaulXiCao commented Aug 25, 2019

PaulXiCao commented Aug 25, 2019

musm commented Mar 11, 2020

heltonmc commented May 16, 2022 • edited Loading

stevengj commented May 16, 2022

heltonmc commented May 16, 2022

heltonmc commented May 16, 2022 •

edited

Loading