-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asymptotic expansion for large arg and small-ish order #48
Conversation
Codecov Report
@@ Coverage Diff @@
## master #48 +/- ##
==========================================
+ Coverage 93.76% 93.79% +0.03%
==========================================
Files 18 18
Lines 1491 1515 +24
==========================================
+ Hits 1398 1421 +23
- Misses 93 94 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
O sorry. I commented without looking through the source code carefully. The higher orders are filled using recurrence (I was baffled by that accuracy for a second). Ok - can I think about where to make the cutoffs for a bit? Are you posing that for lowish orders (v<20) and large arguments this method should be used? It seems like in that domain it can give almost an extra digit of accuracy in that range so I am definitely in favor of including it there. But I think up to what order definitely matters. Recurrence can be slow. We also need to be careful avoiding underflow for very large arguments because they might be zero while the higher orders are not. |
Wow, that's a detailed investigation! Thanks for taking such a thoughtful look into it. Absolutely take whatever time you need to think about cutoffs and stuff. I would say that As another note on underflow, if it continues to be an issue then maybe there should actually be a check of EDIT: oh, and I just picked |
src/besselk.jl
Outdated
function besselk_asymptoticexp(v, x::T) where T | ||
_besselk_asymptoticexp(v, float(x), 10) | ||
end | ||
besselk_asexp_cutoff(nu, x) = (nu < 20.0) && (x > 25.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have different cutoffs for Float64 and Float32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but I don't actually have much of an intuition for what ways they should be different. I'll play around with it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a const ASEXP_CUTOFF
in the consts file. For now they're both 35
, which is what I thought I had typed in this function actually. But I'm sort of still thinking of that value as a placeholder.
I'm wondering if the order could be determined by the ratio of x/nu.... so that it could be determined at runtime with no branches? This would require some care obviously but I don't think that will affect performance too much (I doubt that loop is unrolling). This would have several advantages if such a relation could work out... The underflow we can probably worry about at the end here. I usually do the calculation in Float64 anyway to the desired tolerance because with these asymptotic expansion we are usually dealing with really large or small numbers that are easy to underflow... but yes I think for very large arguments they are not being filled with recurrence.... I think the steps here are first determine where the expansion can be explicitly applied without the use of recurrence. What is the maximum order we can use here? I see some notes on this but can we use high orders here? Edit: here it is using an order of 60 in the asymptotic expansion without recurrence. I have widened the range a bit. You can see my favorite bug in ArbNumerics popping up but we should be able to extract the cutoffs pretty easily from these.... |
When you say order |
Asymptotic expansion order 60 ( nu is on y axis) |
I thought so, but wow. I've never pushed the expansion order anywhere near that high. There is an |
Quickly looking at it. You are using a float value so that is only |
src/besselk.jl
Outdated
for _ in 1:order | ||
# add to the series: | ||
term_v = ak_numv/(factj*_z*eightj) | ||
ser_v += term_v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a simple convergence check here? We may not want to calculate the relative tolerance because that will require a division but we could just break here for term_v < tol? We need to verify that term_v is always greater than one and this will work though if v < 1 it looks like we start somewhere close to one.
This will solve having to dynamically determine the order? I don't think it will add too much performance cost and we only need to check one variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't have thought this would be very helpful considering that the expansion diverges for all finite x
, but with some print statements and stuff it looks plausible. What would the tol
be? Presumably it does need to depend on v
and x
somehow, though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is true. However, we can confine the range of inputs where we employ this expansion and test it pretty rigorously. It may diverge after a point but ideally our tolerance breaks before that and we can control for this I think.
Well - we could just check for relative tolerance but that requires a division. i think tol can just be eps but it really depends on what term_v
starts out at. Which if it's larger than 1 then simple absolute error will work. Let's think through it but this would make it easier than determining the number of terms from x, v and then just running for that amount. Though, if there is a straight forward relation that will be easy too.
I think if you really like the crushing accuracy, maybe we should take the exponential improvement part into consideration. I didn't include it in this because it calls the upper incomplete gamma function, and my hand-implementation of that requires the exponential integral, which would thus require re-writing or lifting from EDIT: I think I need to call it a day, but in case you are interested to tinker in the mean time here is my upper incomplete gamma for when the first arg is a negative integer. I think what might actually be easy is to use the last a_k and z^k and add a few terms from the top of the correction sum and maybe iterate backwards to just add a few more terms. But I'll get to looking at it this weekend so no pressure. |
Co-authored-by: Michael Helton <[email protected]>
Sorry for radio silence here. Still thinking about the exponential improvement thing. It might be worthwhile, but it would take a pretty careful implementation. The expression just uses powers of EDIT: I clicked "commit suggestion" to your suggested speedup and I guess something is broken now. I haven't looked into it a ton yet, but I will at some point. |
Not a problem! I think the exponentially improved version is definitely worthwhile. But for those often times including the remainder piece improves the convergence properties for later terms? But I'm not for sure if it actually increases the convergence for the earlier terms? I would imagine this is great for higher precision calculations ( Ya it looks like my suggestion didn't actually delete the old code that I was trying to replace so you now have a bunch of hanging code with my suggestion and the old implementation. Ill be able to look more carefully at this next week. I would be willing to merge just the asymptotic expansions and then if we want to get to the exponentially improved we could do that separately in the future. Because I think the regular expansions are definitely worth it for now... (once we fix cutoffs and stuff) |
This reverts commit b72e80f.
formatting problem that auto-commit did.
I'm kind of wondering if we should make this code as similar to the asymptotic expansion for Lines 300 to 325 in 8eb7c77
This is just the basic NIST form http://dlmf.nist.gov/10.40.E2 right? So keeping it as similar to the function besselk_large_argument(v, x::T) where T
a = exp(-x / 2)
coef = a * sqrt(pi / 2x)
return T(_besseli_large_argument(v, x) * coef * a)
end
besselk_large_argument_scaled(v, x::T) where T = T(_besselk_large_argument(v, x) * sqrt(pi / 2x))
function _besselk_large_argument(v, x::T) where T
MaxIter = 5000
S = promote_type(T, Float64)
v, x = S(v), S(x)
fv2 = 4 * v^2
term = one(S)
out = term
for i in 1:1000
offset = muladd(2, i, -1)
term *= muladd(offset, -offset, fv2) / (8 * x * i)
out = muladd(term, term, out)
abs(term) <= eps(T) && break
end
return out
end I'm not for sure if that's even faster because obviously it has that convergence check we discussed but it does also take into account that we need the scaled version as well. |
But again we do need to decide how we want to handle possibly using recurrence or where/if we want to employ that. This is a possibility not possible in the Edit: We should probably benchmark a little more to see which version is faster but quickly looking at function _besselk_large_argument(v, x::T) where T
MaxIter = 5000
S = promote_type(T, Float64)
v, x = S(v), S(x)
fv2 = 4 * v^2
term = one(S)
out = term
for i in 1:1000
offset = muladd(2, i, -1)
term *= muladd(offset, -offset, fv2) / (8 * x * i)
out = muladd(term, term, out)
#abs(term) <= eps(T) && break
end
return out
end
function _besselk_as_pair(v, x::T) where T
fv = 4*v*v
z = 8*x
knu = one(T)
ak_nu = fv - 1
for i in 1:1000
term_v = ak_nu / z
knu += term_v
a = muladd(2, i, one(T))^2
z *= 8*x*(i + one(T))
ak_nu *= fv - a
end
return knu
end It looks like |
A good point---it should presumably be possible to re-write at least the series part in one function that gets re-used by both
Assuming I put that into this PR, where would you like this function that serves both It would be nice to ensure that the compiler somehow knows that all With regards to recurrence, I originally put that in there because it seemed to help at the (approximation) orders that I expected to use, like 5-10. If you're cranking it all the way up to 60, I wouldn't be surprised if the recurrence didn't actually help all that much with accuracy and could just be removed. I also think I'd vote for merging something like this without the exponential improvement for now. I am pretty concerned that even with the fastest possible |
I agree on just merging something of this form for sure! This function can just go into besselk file and then we can call it in the appropriate file with the scaled versions etc. But yes let's try to keep it as simple as possible. We will need to define a cutoff where we can employ this though because including the higher series leaves the possibility it won't ever converge and just actually start growing in terms. The Yes it might be best to not rely on recurrence here honestly because I think this will be able to be employed in a larger range given the more orders. |
Makes sense. Tinkering around here, and there's actually something I don't get if you don't mind. In your As written, your version and my version for the series don't give the same answer, and when I sub in your version then all the tests fail. But they don't fail for If I swap in a modified form of your code that looks like this: function _besselk_as_new(v, x::T) where T
fv = 4*v*v
term = one(v)
out = term
for i in 1:[some iter]
offset = muladd(2, i, -1)
term *= muladd(offset, -offset, fv)/(8*x*i)
out += term
end
return out
end I get the same output now. Am I missing something? |
Looking at my original code... I didn't convert the kernel right so it was still using the function besselk_large_argument(v, x::T) where T
a = exp(-x / 2)
coef = a * sqrt(pi / 2x)
return T(_besselk_large_argument(v, x) * coef * a)
end
besselk_large_argument_scaled(v, x::T) where T = T(_besselk_large_argument(v, x) * sqrt(pi / 2x))
function _besselk_large_argument(v, x::T) where T
MaxIter = 5000
S = promote_type(T, Float64)
v, x = S(v), S(x)
fv2 = 4 * v^2
term = one(S)
res = term
s = term
for i in 1:MaxIter
offset = muladd(2, i, -1)
term *= muladd(offset, -offset, fv2) / (8 * x * i)
res = muladd(term, s, res)
abs(term) <= eps(T) && break
end
return res
end This is the appropriate form. So it is really just the |
Ah, I see. I really need to get more into this Anyways, pushed up that version here. I was thinking about the code duplication question a bit and started doubting myself on how useful it actually was to merge the series things for From some earlier skims I gather that the last thing you'd like to discuss is cutoffs to use this. The current one that I put in here of |
Yes - I agree. Keeping the functions separate is fine with me. And yes the cutoff could be something to discuss further but no matter what it has a quadratic dependence. I will look further into it in the future but it does also keep the total computational time by putting an upper bound on I think it's definitely time to get this merged. |
Okay, here's a start to an asymptotic expansion. This one is probably again going to be a bit controversial to you guys, because once the SIMD-powered uniform asymptotic expansion things go live this will probably only be faster for
nu = 30
or smaller, say. But then, that is the entirety of the spatial statistics applications, so in my biased opinion it's probably worth a branch. I make the current cutoff20
, though, because for the largest args theFloat32
tests are failing. Not entirely sure what to make of that.I haven't formatted things how you like it because I assume there will be plenty of iteration here. But even without the exponential improvement I'm getting pretty good accuracies here, and meaningfully better speeds for small enough
v
.Curious to hear what you guys think!
I should also say, I recognize that there are a lot of redundant checks at this point about the size of
x
andnu
and stuff. So I know that will need a little work.