Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic differentiation (AD) not supported #17

Open
heltonmc opened this issue May 9, 2022 · 22 comments
Open

Automatic differentiation (AD) not supported #17

heltonmc opened this issue May 9, 2022 · 22 comments

Comments

@heltonmc
Copy link
Member

heltonmc commented May 9, 2022

Currently, implementations don't support automatic differentiation with packages such as ForwardDiff.jl.

Probably best to give ChainRules such as in https://github.com/JuliaMath/SpecialFunctions.jl/blob/master/src/chainrules.jl or take those implementations directly as this package matures.

@oscardssmith
Copy link
Member

Also, you should look at https://github.com/cgeoga/BesselK.jl. It allows for autodiff of besselk with respect to the order which is pretty cool. Also, it apparently uses some fancyness to automatically generate some of the polynomial coefficients rather than requiring hand-derivation.

@heltonmc
Copy link
Member Author

Just making a note of this here. There was some interest from from Slack about getting derivatives to work with respect to both argument and order here. SpecialFunctions.jl only provides rules for argument derivatives. BesselK.jl shows that this is possible for orders with some care.

This should be possible once we get the arbitrary orders done, but this seems like functionality people are interested in so I'll work on this after we finish the arbitrary orders. Though, historically ForwardDiff.jl gives me fits...

@heltonmc
Copy link
Member Author

heltonmc commented Jul 7, 2022

Commenting here since the besselk package was mentioned. I'm currently exploring the possibility of numerically integrating these bessel functions. The advantage is you need one algorithm and I'm sure we can optimize a simple for loop quite well that works for any argument and order.

The advantage of this is you can also do analytic derivatives with respect to order and argument. Here are three functions that can calculate besselk and its derivative with respect to order and argument based on modifying [1]. We should probably spend some time optimizing the repeated cosh calls out as they are just incremented by a fixed amount each loop. Either way these seem to be excellent as they converge rapidly and can be used to any desired precision! (assuming a well defined h)

function besselk(v, x::T) where T
    MaxIter = 5000
    tol = eps(T)
    h = T(0.1)
    t = zero(T)

    out = 0.5*exp(-x)

    for _ in 1:MaxIter
        t += h
        tmp = cosh(v*t)*exp(-x*cosh(t))
        out += tmp
        tmp / out < tol && break
    end
    return out * h
end

function dxbesselk(v, x::T) where T
    MaxIter = 5000
    tol = eps(T)
    h = T(0.1)
    t = zero(T)

    out = -0.5*exp(-x)

    for _ in 1:MaxIter
        t += h
        tmp = cosh(t)*cosh(v*t)*exp(cosh(t)*x)
        out += tmp
        tmp / out < tol && break
    end
    return out * h
end
function dvbesselk(v, x::T) where T
    MaxIter = 5000
    tol = eps(T)
    h = T(0.1)
    t = zero(T)

    out = zero(T)

    for _ in 1:MaxIter
        t += h
        tmp = t*exp(cosh(t)*x)*sinh(t*v)
        out += tmp
        tmp / out < tol && break
    end
    return out * h
end

[1] Schwartz, Charles. "Numerical calculation of Bessel functions." International Journal of Modern Physics C 23.12 (2012): 1250084.

@cgeoga
Copy link
Contributor

cgeoga commented Jul 25, 2022

Just was passing through and was curious how stuff here was going, and this is pretty neat! From a little tinkering it seems like when the argument x is small these might be a bit slow to converge. I would have thought that so many exp and cosh calls would just totally kill performance, but even as provided its within, like, a factor of 4 of what I did a lot of grinding on. I also would have thought that the tmp variable would be an issue because it is a "big*tiny" operation, but the accuracy is also really impressive. Just with standard identities for cosh(x+y) and sinh(x+y) I wrote this version that does one of the optimizations I assume you had in mind:

function _besselk(v, x::T) where T
    MaxIter = 5000
    tol = eps(T)
    h = T(0.1)
    t = h
    out = 0.5*exp(-x)
    # pre-compute a few nice quantities:
    (cosh_t,  sinh_t)  = (cosh(t),   sinh(t))
    (cosh_vt, sinh_vt) = (cosh(v*t), sinh(v*t))
    (cosh_h,  sinh_h)  = (cosh(h),   sinh(h))
    (cosh_vh, sinh_vh) = (cosh(v*h), sinh(v*h))
    for _ in 1:MaxIter
        tmp  = cosh_vt*exp(-x*cosh_t)
        out += tmp
        tmp / out < tol && break
        # update the quantities, t -> t+h, vt -> v(t+h):
        (cosh_t, sinh_t)   = (cosh_t*cosh_h + sinh_t*sinh_h,
                              sinh_t*cosh_h + cosh_t*sinh_h)
        (cosh_vt, sinh_vt) = (cosh_vt*cosh_vh + sinh_vt*sinh_vh,
                              sinh_vt*cosh_vh + cosh_vt*sinh_vh)
    end
    return out * h
end

Maybe something that just has one branch based on the size of x would make sense? There are a lot of things that work really well when x is small, so that's actually a pretty easy problem to fix.

Cool stuff.

@oscardssmith
Copy link
Member

in floating point, all multiplies retain full accuracy (unless they underflow or overflow). Also, exp and cosh are pretty fast. They're both in the range of the cost of 2 divisions.

@cgeoga
Copy link
Contributor

cgeoga commented Jul 25, 2022

Ah, for some reason I thought that would be an issue. Also interesting about exp and cosh being fast. Out of curiosity, what is the explanation for my version that computes the cosh terms manually with the cosh(x+y) = ... identities being so much faster then?

@oscardssmith
Copy link
Member

They're fast, but not free.

@cgeoga
Copy link
Contributor

cgeoga commented Jul 25, 2022

Fair enough, I guess that's the power of micro-optimization at this level. I wouldn't have guessed they were in the cost range of 2 divisions. That's pretty neat.

Anyways, will definitely be keeping an eye on this! I would still think that for small enough and big enough x (and big enough v) there will be potentially faster options, but at the very least for x between, like, 4 and 25, which was by far the most painful part of the domain in my project, this is a very compelling method.

@heltonmc
Copy link
Member Author

heltonmc commented Jul 26, 2022

Many thanks @cgeoga for looking at and optimizing that code block! I found it about 2.5-3x faster than my original implementation. The additions in the optimized version do increase the relative error slightly compared to the all multiply version as Oscar mentioned which retains really good accuracy (though slower). On some quick tests it looked like the ULP increased from ~5 to ~15 which is still pretty good considering it's much faster.

But yes! I always want numerical integration to work for this as they are so much easier and flexible but the convergence is variable. As you mentioned they are slower converging for small x but also for increasing v. The small x is not as concerning because you use power series there and the large v is also not concerning because you use the uniform asymptotic expansion there. One other concern is how to choose h which needs to be smaller to retain the same accuracy for large x. So although it may converge faster we need to also use a smaller h to retain same accuracy. But for large x that's also not an issue because you just use asymptotic expansion for large argument.

So these may actually have a really good application window. I've been spending my current time on bessely but the toughest region to get accuracy is actually between the power series accuracy (x<7) and the asymptotic accuracy (x>20) and for smallish orders (v < 30) which is a really odd window. What I'm thinking might be a very powerful tool is to employ numerical integration in this region (7<x<20) but also keep v small (0<nu<2) to rapidly improve convergence and then fill everything else with forward recurrence up until the uniform asymptotic expansion validity. So essentially I would modify this numerical integration to give v and v+1 in a region with fast convergence. The modification to give both v and v+1 I don't think would actually increase the computational much because numerical integration is so latency bound if you just have one or to multiplies within a loop that depend on each other.

On the other hand, I'm currently exploring just using a Chebyshev basis for this region and writing a really custom evaluation scheme to generate this..... the numerical integral for bessely is kinda a pain so this has been attractive. Hoping to look at besselk again soon!

Edit: And yes this is definitely going to be slow as the other asymptotic expansions I feel like can just be optimized so much more (https://github.com/heltonmc/Bessels.jl/pull/23). One thing to consider is a different contour to integrate along (maybe some type of Hankel contour).... but even if we improved convergence operating in the complex domain might take away a lot of that gain.

@cgeoga
Copy link
Contributor

cgeoga commented Jul 26, 2022

That makes a lot of sense. That region of x \in [7,20] or so was by far the most painful part for me too. Keeping v small and using forward recurrence seems like a great idea to me. When you say using a Chebyshev basis, do you mean in full 2D with respect to both v and x?

Apologies if this is a bit too off-topic for this issue, and if so I'm happy to open a new issue to discuss it, but I played around a little with using a continued fraction representation to get K_{v+m} / K_{v+m+1} for m sufficiently large that K_{v+m+1} could be really well-approximated by the uniform asymptotic expansion, and then doing the backwards recursion to get back to K_v. I have some code sitting around that does the relevant continued fraction from the Cuyt book with the modified Lentz algorithm. I had a bit of trouble making it both fast and accurate, but your uniform asymptotic expansion code is a lot better than mine, so maybe it would be appealing?

Again apologies if a big code dump here is not good issue etiquette, but here is my basic code:

# gives besk(v+1,x)/besk(v,x)
function besk_cf_ratio_opt(v, z, nmax=100, tol=1e-16) # this takes about 60ns on my machine
  # do the modified Lentz method:
  (hn, Dn, Cn) = (1e-50, zero(v), 1e-50)
  onev = one(v)
  twov = onev+onev
  jf   = onev
  qi   = inv(twov+twov)
  vv   = v*v
  for j in 1:nmax
    an  = (vv - ((twov*jf-onev)^2)*qi)
    bn  = 2*(z+jf)
    Cn  = bn+an/Cn      # skipping zero check...
    Dn  = inv(bn+an*Dn) # skipping zero check...
    del = Dn*Cn
    hn *= del
    (abs(del-1)<tol) && break
    jf += onev
  end
  (twov*v+onev+twov*z)/(twov*z) + hn/z
end

function _besk_intermediate_asv(v, z, offset=5, order=10)
  rat        = besk_cf_ratio_opt(v+offset, z)
  upper_val  = # your asymptotic expansion for besselk(v+offset+1, z)
  lower_val  = inv(rat)*upper_val
  _vp2       = v+offset+1
  _vp1       = v+offset
  (mp2, mp1) = (upper_val, lower_val)
  m          = mp2 - (2*_vp1/z)*mp1
  for j in 1:(offset-1)
    _vp1 -= one(v)
    mp2   = mp1
    mp1   = m
    m     = mp2 - (2*_vp1/z)*mp1
  end
  m
end

I don't really know about continued fractions and have never taken a numerical analysis course, so some of the parameter choices here are probably not good. But even with my uniform asymptotic function, this did a little better than what I currently use in that region but was a little bit slower.

What are your thoughts about that strategy?

@heltonmc
Copy link
Member Author

heltonmc commented Jul 26, 2022

Yes fully 2-d. It looks like you can get 16 digits of accuracy with about 20 Chebyshev polynomials which can be evaluated quite fast.

This is a fine place to discuss! One issue to be careful with is that backward recurrence with besselk is never stable. This is actually untrue with bessely as forward recurrence is always stable and backward recurrence is stable as long as nu<x (though that is usually never helpful). So looking through the code I would imagine that increasing the offset would actually increase the error (as you are recursing through more values). So an offset of 1-5 might be stable and give accurate results but this error will increase dramatically if you use an offset larger than say 10? The actual error will depend on how fast the values you are recursing through decrease. This is kinda of a problem considering we want to shift the order pretty large. Say are asymptotic expansions are valid when nu>40 but we want to calculate besselk(10, 15.0). I would imagine this won't be stable. Have you checked if this would work here?

We do currently employ a continued fraction part which I will say is by far the slowest code in our entire code base right now. We use it though to go from besselk to besseli with the wronskian but I will probably try to avoid using it. The only time this would make sense if we could go the other way say we have besseli and can use large order expansion and use downward recursion as its stable and then get to besselk which would be nice but I'm not for sure this is actually possible.

Edit: And honestly the debye expansion is so fast I can probably optimize it some more so we can produce both K{v} and K{v+1} faster than computing the continued fraction to get the ratio and in constant time

Edit2: It looks like we would need about a (18, 30) chebshy grid which isn't as efficient as I thought...

@cgeoga
Copy link
Contributor

cgeoga commented Jul 26, 2022

Ah, right, I forgot about the backwards recurrence being bad. That would explain why this didn't work for me for larger offsets, just as you predict. Oof. You know, it's funny, I really only ever need it for v in, like, [0.25, 5], say, so I never really think about v=50. But between this and an issue somebody just opened on BesselK.jl about wanting a faster besselk(0, x) (and I sent them here), I think I should adjust my README to make that clearer.

I hear you about the Debye expansion thing. This is really pretty exciting. Do you have a direct series implementation that will work for integer v though? I don't.

@heltonmc
Copy link
Member Author

I don't have a power series yet for besselk but hoping to look at that next once I get bessely done. There is a power series form for besselk when v is an integer using digamma functions but computationally I don't know how well that perform. It might be better to use interpolation... but I don't think I would handle it that way.. I would probably have fast version of besselk(0, x) and besselk(1, x) and use forward recurrence until you want to employ the asympotic expansions. Though I have not given much thought to derivatives honestly. I really would like to get get implementations for all real and complex values before tackling the derivatives...

@heltonmc
Copy link
Member Author

heltonmc commented Jul 27, 2022

Ok - after diving into this problem a little bit I think I have a working solution. I'm not for sure why I never came back to this but so far the best method is to use hypergeometric series here. @oscardssmith mentioned this https://github.com/heltonmc/Bessels.jl/issues/16 and it was very easy to set up but see the comments there about convergence and honestly I couldn't guarantee accuracy for a variety of inputs (essentially you can't really guarantee accuracy for those functions over a large input range). At the time I was looking for an algorithm for when x~nu and it converted slow.....

Flash forward to this discussion and it seems like a very good fit if we consider the right input ranges. @cgeoga although your code you shared probably won't work with debye expansions using backward recurrence it could work if we could generate besselk for very low orders (0<v<1) and use forward recurrence. Then we can use that to get v+1 and use forward recurrence to fill everything else. Now, looking at that code a little more it actually converges fastest when v is really small so this works perfectly as we are keeping a very tight window on v....

Ok so now the trick is we need something to give us very accurate results for small orders that converges reasonable fast... well that is the hypergeometric series. So piecing all the code together from @cgeoga code block above https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1195726642, our recurrence discussion in #25, and some code from HypergeomtricFunctions.jl we can get a somewhat cumbersome source code for this.

function besselk_mid_arguments(v, x)
    v_low = v - floor(v)
    k0 = besselk_hypergeometric(v_low, x)
    k1 = besk_cf_ratio_opt(v_low, x)*k0

    k2 = k1
    x2 = 2 / x
    arr = range(start=v_low+1, stop=v, step=1)
    for n in arr
        a = x2 * n
        k2 = muladd(a, k1, k0)
        k0 = k1
        k1 = k2
    end
    return k0
end

function besselk_hypergeometric(v, x)
    a = v+1/2; b = 2v+1; c = 2x
    return drummond2F0(a, a-b+1, -1/c)*c^(-a)*sqrt(pi)/exp(x)*(2x)^v
end

# taken from https://github.com/JuliaMath/HypergeometricFunctions.jl/blob/master/src/drummond.jl
# ₂F₀(α,β;z)
using LinearAlgebra
@inline errcheck(x, y, tol) = isfinite(x) && isfinite(y) && (norm2(x-y) > max(norm2(x), norm2(y))*tol)
@inline norm2(x) = norm(x)
function drummond2F0::T1, β::T2, z::T3; kmax::Int = 10_000) where {T1, T2, T3}
    T = promote_type(T1, T2, T3)
    absα = abs(T(α))
    absβ = abs(T(β))
    if norm(z) < eps(real(T)) || norm*β) < eps(absα*absβ)
        return one(T)
    end
    ζ = inv(z)
    Nlo = ζ/*β)
    Dlo = ζ/*β)
    Tlo = Nlo/Dlo
    Nmid = (2ζ-+1)*+1))*Nlo + 2ζ
    Dmid = (2ζ-+1)*+1))*Dlo
    Tmid = Nmid/Dmid
    if norm((α+1)*+1)) < eps((absα+1)*(absβ+1))
        return Tmid
    end
    Nmid /=+1)*+1)
    Dmid /=+1)*+1)
    Nhi = (3ζ-+2)*+2)-+β+3))*Nmid -+β+3-ζ)*Nlo + ζ
    Dhi = (3ζ-+2)*+2)-+β+3))*Dmid -+β+3-ζ)*Dlo
    Thi = Nhi/Dhi
    if norm((α+2)*+2)) < eps((absα+2)*(absβ+2))
        return Thi
    end
    Nhi /=+2)*+2)
    Dhi /=+2)*+2)
    k = 2
    while k < kmax && errcheck(Tmid, Thi, 10eps(real(T)))
        Nhi, Nmid, Nlo = ((k+2)*ζ-+k+1)*+k+1)-k*+β+2k+1))*Nhi - k*+β+3k-ζ)*Nmid - k*(k-1)*Nlo, Nhi, Nmid
        Dhi, Dmid, Dlo = ((k+2)*ζ-+k+1)*+k+1)-k*+β+2k+1))*Dhi - k*+β+3k-ζ)*Dmid - k*(k-1)*Dlo, Dhi, Dmid
        Thi, Tmid, Tlo = Nhi/Dhi, Thi, Tmid
        if norm((α+k+1)*+k+1)) < eps((absα+k+1)*(absβ+k+1))
            return Thi
        end
        Nhi /=+k+1)*+k+1)
        Dhi /=+k+1)*+k+1)
        k += 1
    end
    return isfinite(Thi) ? Thi : isfinite(Tmid) ? Tmid : Tlo
end

# taken from Chris Geoga https://github.com/cgeoga
# https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1195726642
function besk_cf_ratio_opt(v, z, nmax=100, tol=1e-16) # this takes about 60ns on my machine
    # do the modified Lentz method:
    (hn, Dn, Cn) = (1e-50, zero(v), 1e-50)
    onev = one(v)
    twov = onev+onev
    jf   = onev
    qi   = inv(twov+twov)
    vv   = v*v
    for j in 1:nmax
      an  = (vv - ((twov*jf-onev)^2)*qi)
      bn  = 2*(z+jf)
      Cn  = bn+an/Cn      # skipping zero check...
      Dn  = inv(bn+an*Dn) # skipping zero check...
      del = Dn*Cn
      hn *= del
      (abs(del-1)<tol) && break
      jf += onev
    end
    (twov*v+onev+twov*z)/(twov*z) + hn/z
  end

Ok so now let's look at the accuracy in some problematic regions x in [7,20] and nu around 15.

julia> 1 -  besselk_mid_arguments(15.0, 10.0)/SpecialFunctions.besselk(15.0, 10.0)
-1.1102230246251565e-15
julia> 1 -  besselk_mid_arguments(16.2, 13.4)/SpecialFunctions.besselk(16.2, 13.4)
1.1102230246251565e-16

So you can see this approach is very accurate providing at least 15 digits of accuracy I'm not for sure how accurate the SpecialFunctions.jl implementation to distinguish past that.

And now the perfomance checks. I will say I have not spent any time optimizing this but I would imagine we can do better.

julia> @benchmark besselk_mid_arguments(15.2, x) setup=(x=15.0+rand())
BenchmarkTools.Trial: 10000 samples with 446 evaluations.
 Range (min  max):  231.128 ns  511.025 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     238.509 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   243.026 ns ±  10.436 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ██▁     ▆▄▆▄  ▆▄ ▂▃▁▄▁  ▃▁  ▂▄▃        ▂▁           ▂
  █▆▁▁▁▁▁▄▄▃███▇▇▆▇██████▇██▇█████▇████▇███▇▆▅██▅▆▆██▅▆▆▅▆▅▅▆▆█ █
  231 ns        Histogram: log(frequency) by time        267 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Now I wouldn't say this is blazing fast by any means but I think it is very good as our general fallback for problematic regions! It works for integer and decimal nu and gives very high accuracy results (also works for complex arguments). I'm sure we can spend some time optimizing this a bit and would love to probably have a version of this in this here in the future (@cgeoga if your willing to contribute your code block here)

@oscardssmith
Copy link
Member

This looks like it could be made a lot faster by factoring out small bits. It looks like there is a lot of duplicated work.

@heltonmc
Copy link
Member Author

heltonmc commented Jul 27, 2022

@oscardssmith absolutely. I think we should edit the source code from HypergeomtricFunctions.jl and have a version of it here or submit PR over there. I really haven't had time to look into it too carefully to see where we can make up some performance.

@cgeoga
Copy link
Contributor

cgeoga commented Jul 27, 2022

Wow. That is some seriously impressive composition. I also played a lot with HypergeometricFunctions.jl but also found that it was just too slow in certain parts of the domain, but these speeds are already on par with AMOS pre-optimization, so that's seriously cool.

Not to derail, but one more idle thought: I tinkered a bit this morning with trying to use the forward recursion, but starting from big negative values of v, so like you could use the CF and compute Bessels.besselk_large_order(50, x) for besselk(-50, x) and then forward recurse. But it seems like you can't pull a fast one like that. If this strategy ends up still being unsatisfying slow, though, I wonder if a more thoughtful version of that strategy might be worth looking more at. In particular, because it shows up in the Matern function BesselK has some special methods for (x^v)*besselk(v, x), which also has a forward recursion that is just slightly different, and combining that with a big negative v might actually be pretty nice because besselk(-big, x)*(x^(-big)) is not a tiny or huge number for intermediate x. It's not hard to modify uniform asymptotic expansion code to compute the modified form directly.

Honestly my understanding of numerical stability isn't very good, so I don't have a good intuition about the potential problems here. Is there any way that a trick like this might work, or is doing the forward recursion but starting with giant numbers that get smaller and then get bigger again the same problem as backwards recursion?

@heltonmc
Copy link
Member Author

heltonmc commented Jul 27, 2022

Yes. I found the same thing that as a general tool it is just way too slow but I think if you can limit the order to be small and use recurrence this can be a pretty good tool considering finding another algorithm in this range is difficult (though the numerical integration in this range is powerful too....) so maybe we lean on that? I don't know that will take some comparisons.... though I like the simplicity of the numerical integration a lot. But I like the idea of really limiting the order between 0 and 1 to increase convergence and then using the forward recurrence....

Ya that would be kind of tricky, but I don't think it will be stable. In general besselk will be stable moving away from order 0 and besseli will be stable moving towards order 0. You just need to ask is the value I'm recurring through increasing or decreasing.
If it is increasing like this

julia> besselk(1.0, 10.0)
1.864877345382559e-5

julia> besselk(10.0, 10.0)
0.0016142553003906702

julia> besselk(100.0, 10.0)
4.5966740842695555e85

You are going to have stable recurrence without loss of precision.... but what about the other way

julia> besselk(-100, 10.0)
4.5966740842695555e85

julia> besselk(-10, 10.0)
0.0016142553003906702

julia> besselk(-1, 10.0)
1.864877345382559e-5

You can see that the value is decreasing (obviously v=-v for besselk) so this recurrence will be unstable.

@cgeoga
Copy link
Contributor

cgeoga commented Jul 27, 2022

Nice, that's a very helpful gestalt, thanks! It makes good sense intuitively too. Guess I can't pull a fast one on floats like that. Oh well.

I agree with you about that argument range being so rough though. AMOS and the Temme paper both talk about the Miller recursion algorithm, but I tinkered with that very early in our project and kind of dismissed it for some reason. Maybe that was incorrect, but I see your point about how there aren't a ton of other options for this part of the domain.

@heltonmc
Copy link
Member Author

Ya I think it comes down to do we either use 1. hypergeometric series or 2. numerical integration in this region. Both are pretty good for this region. I believe the AMOS and Temme paper use the Miller algorithm to solve the hypergeometric series? I remember briefly looking at that paper and found the explanation.... confusing? So I would imagine these approaches are very very similar in that they are using the hypergeometric form and then using forward recurrence. I think it's a really powerful approach but we might be able to optimize the numerical integral step size better to get better performance with way less code....

I really liked your version you shared originally that is more optimized but loses a digit or two. I'm wondering if we can just focus on increasing the accuracy of that version.....

On a side note... bessely has the same problems and then some because there isn't a nice integral representation that converges quickly and not a nice hypergeometric form. Did you get the besk(v+1,x)/besk(v,x) form from a paper? You wouldn't happen to know the form for bessely?

@cgeoga
Copy link
Contributor

cgeoga commented Jul 27, 2022

I actually got that continued fraction form from this book. Chapter 17 pertains to Bessel functions and appears pretty light on information for bessely, but it does have a lot of nice representations for the Hankel function and besselj, so maybe there are some tricks to be played there?

With regard to squeaking out just a few extra digits using a simpler method, there is one other option in that vein I think. The exponential-improvement corrections to the asymptotic expansion (NIST 10.40(iv)) really do a lot, and for (v,x) = (0.3, 10.1), two numbers that I chose pretty randomly, with the full-order correction and order 7 I get an atol of 1.6e-15 at the runtime cost of 97ns. With an order of 10 I get an atol error of 7e-17 at 150ns.

I have that implementation here, although I apologize in advance for how unreadable it is. If that is potentially appealing, I'd be happy to try and put together a readable version that uses some of the performance tools I've learned from our discussions here. Unlike the gamma function, the upper incomplete gamma function is easy to write exactly and very quickly in pure Julia, so it's pretty zippy even though the formulae in NIST 10.40(iv) would make you think it really drags.

EDIT: An example of the function call is

BesselK._besselk_as(v, x, order, true, false)

where that last false to say not to compute (x^v)*besselk(v,x).

@heltonmc
Copy link
Member Author

heltonmc commented Aug 2, 2022

Many thanks for including this! I'm starting to look at the modified versions again so looking into this now. I'd ideally like to reuse as much code as possible so if you would like to submit a PR please do.

One thing to keep in mind though for the modified version we are really only interested in the relative tolerances for these functions because they do not cross the axis so we should be able to provide relative tolerances for all values. I really was hoping to provide ~14.5-15 digits of accuracy for these functions which I think is possible.
So for your example..

julia> (v, x) = 0.3, 10.1

julia> 1 - SpecialFunctions.besselk(v, x) / BesselK._besselk_as(v, x, 10, true, false)
4.6033177270032866e-12

 julia> 1 - SpecialFunctions.besselk(v, x) / BesselK._besselk_as(v, x, 20, true, false)
7.205347429817266e-14

So even with an order of 20 we are almost getting 14 digits which SpecialFunctions.jl gives a relative tolerance of -4.440892098500626e-16.

Though, it seems like for this regime the expansions just converge too slowly....

julia> (v, x) = 0.3, 80.1
julia> 1 - SpecialFunctions.besselk(v, x) / BesselK._besselk_as(v, x, 6, true, false)
4.440892098500626e-16

So we will just have to pick where we want to actually employ this where it converges relatively quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants