Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate bessel function definitions #178

Open
PaulXiCao opened this issue Aug 20, 2019 · 7 comments
Open

duplicate bessel function definitions #178

PaulXiCao opened this issue Aug 20, 2019 · 7 comments

Comments

@PaulXiCao
Copy link
Contributor

What is the reason why there are duplicate calls to bessel functions available?
For instance

besselj(nu,z)
besselj0(x)
besselj1(x)

Isnt that an implementation detail that the library user should not care about?

@PaulXiCao
Copy link
Contributor Author

There seems to be no performance boost for the MPFR:

julia> x=10; @time ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
  0.000070 seconds (12 allocations: 394 bytes)
1

julia> n=0; x=10; @time ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
  0.000070 seconds (12 allocations: 394 bytes)

@PaulXiCao
Copy link
Contributor Author

I looked around and most other software also does not distinguish between orders 0,1,n

@PaulXiCao
Copy link
Contributor Author

using BenchmarkTools one can see that the BigFloat implementation is of the same speed whereas the Float64 implementation is twice as fast when using besselj0(x) instead of besselj(0,x)

julia> n=0; x=10; @benchmark ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
BenchmarkTools.Trial: 
  memory estimate:  394 bytes
  allocs estimate:  12
  --------------
  minimum time:     30.922 μs (0.00% GC)
  median time:      31.226 μs (0.00% GC)
  mean time:        31.518 μs (0.00% GC)
  maximum time:     94.123 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> x=10; @benchmark ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
BenchmarkTools.Trial: 
  memory estimate:  394 bytes
  allocs estimate:  12
  --------------
  minimum time:     30.202 μs (0.00% GC)
  median time:      30.553 μs (0.00% GC)
  mean time:        30.859 μs (0.00% GC)
  maximum time:     118.430 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> x=10; @benchmark ccall(("j0",libm),  Float64, (Float64,), x)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     625.947 ns (0.00% GC)
  median time:      642.450 ns (0.00% GC)
  mean time:        656.163 ns (1.80% GC)
  maximum time:     118.823 μs (99.40% GC)
  --------------
  samples:          10000
  evals/sample:     171

julia> nu=0; x=10; @benchmark ccall(("jn", libm), Float64, (Cint, Float64), nu, x)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     1.128 μs (0.00% GC)
  median time:      1.168 μs (0.00% GC)
  mean time:        1.179 μs (0.00% GC)
  maximum time:     4.927 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

What about only offering the interface besselj(n,x) but making special calls for n=0,1 and other?

@musm
Copy link
Contributor

musm commented Mar 11, 2020

Agreed.

Could you open a PR?

@heltonmc
Copy link
Member

heltonmc commented May 16, 2022

I'd probably recommend keeping the besselj0 and friends syntax. Scipy actually does differentiate this and these functions are some of the most used.

I'm not sure if a microbenchmark like this will catch the performance gain because of the branch prediction (or how that propagates from the C library?). Almost all of these algorithms default to specialized calls for nu=0,1 so it probably is good to separate them.

Though still in development it is faster to avoid the branches in Bessels.jl from the generic besselj call....

julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min  max):  9.425 ns  29.029 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     9.510 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.532 ns ±  0.412 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▄     █     ▅     ▃     ▂                             ▁
  ▇▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▆▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▅ █
  9.43 ns      Histogram: log(frequency) by time     9.84 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels.besselj1(x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  6.375 ns  29.542 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     6.459 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.551 ns ±  0.796 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃  █  ▇  ▄   ▂     ▂  ▄  ▁                ▁             ▂
  ▆▁▁█▁▁█▁▁█▁▁█▁▁▁█▁▁▆▁▁█▁▁█▁▁█▁▁▁█▁▁▇▁▁▆▁▁█▁▁▁█▁▁▇▁▁▇▁▁▇▁▁▆ █
  6.38 ns      Histogram: log(frequency) by time     7.12 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@stevengj
Copy link
Member

It's a bit surprising to me that the branch is so significant in cost compared to the Bessel computation…

Couldn't we set up the besselj(n,x) routine so that constant-propagation can eliminate the n==0 and n==1 checks when n is a literal 0 or 1?

@heltonmc
Copy link
Member

I was honestly a bit surprised too. I will probably open an issue over at Bessels.jl to continue this discussion there because this 3-4 ns difference holds across all the input ranges...

julia> @benchmark Bessels.besselj1(x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min  max):  20.436 ns  86.552 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     20.855 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.943 ns ±  1.397 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▇▃▁█▇▄ ▁▃   ▁▃                                            ▂
  ████████▇███▇████▆▇██▇▆▆█▆▅▆▃▁▄▁▁▁▃▁▁▃▁▄▁▁▁▁▁▃▃▄▄▁▄▃▃▃▄▄▄▃▄ █
  20.4 ns      Histogram: log(frequency) by time      25.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min  max):  24.598 ns  53.087 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     25.268 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.216 ns ±  1.259 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃█      ▃▆                                                  
  ▂██▃▂▂▁▁▃██▆▄▂▂▂▂▁▁▂▂▂▂▂▂▁▂▁▂▁▂▁▂▁▁▁▁▁▂▂▂▂▂▁▁▁▂▁▂▂▂▁▂▂▂▁▂▁▂ ▃
  24.6 ns         Histogram: frequency by time        28.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I'm hoping this has something to do with actively working on the generic implementation for arbitrary nu so benchmarking might be a little off for now.

Though I would think this function would be able to specialize better for nu=0,1?

function _besselj(nu, x)
    nu == 0 && return besselj0(x)
    nu == 1 && return besselj1(x)
    if x < 20.0
        if nu > 60.0
            return log_besselj_small_arguments_orders(nu, x)
        else
            return besselj_small_arguments_orders(nu, x)
        end
    elseif 2*x < nu
        return besselj_debye(nu, x)
    elseif x > nu
        return besselj_large_argument(nu, x)
    else
        return nothing
    end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants