-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aabb rewrite #1270
Aabb rewrite #1270
Conversation
Increases program run by about 10%, and the code is easier to understand using interval arithmetic. A separate change to ray.origin() and ray.direction() returning const refs adds another 5% speedup. In addition, I discovered that in many cases using `fmin()` and `fmax()` are performance bombs. Likely this is because these functions have special handling for NaNs. Instead, I switched to using ternary expressions like `a < b ? a : b`, which had a large performance impact. For the `aabb::hit()` function, this improved performance of the new interval code from 100% worse to about 10% better. Added a new `span()` function that returns an interval of two doubles, regardless of their order. Added new `interval::is_empty()` function that also returns true when either of the bounds is a NaN. Added new `interval::intersect()` function.
- Use a consistent wrap format for long captions. - Always have a blank line before a closing </div> tag following a listing, due to Markdeep behavior. - Minor correction to one caption typo. - Fix neglected indent for a code listing.
The `d` is more consistent with the code using `d` for direction.
- Add clarifying comment for corner cases of ray-aabb intersection. - Fix some listings. - Use new `span()` function. - Update `aabb::hit()` code. - Deprecate section "An Optimized AABB Hit Method" - Document new `interval` functions.
books/RayTracingTheNextWeek.html
Outdated
@@ -462,20 +463,20 @@ | |||
How do we find the intersection between a ray and a plane? Recall that the ray is just defined by a | |||
function that--given a parameter $t$--returns a location $\mathbf{P}(t)$: | |||
|
|||
$$ \mathbf{P}(t) = \mathbf{A} + t \mathbf{b} $$ | |||
$$ \mathbf{P}(t) = \mathbf{A} + t \mathbf{d} $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine as it is but I wanted to point out that the "Ray-Sphere Intersection" chapter in InOneWeekend defines a ray as $\mathbf{P}(t) = \mathbf{Q} + t\mathbf{d}$
(i.e. it uses "Q" instead of "A").
I don't know if it's important to keep math symbols consistent across the books but might as well change that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Will do. I'd love to use
const interval& ax = axis(a); | ||
auto ao = r.origin()[a]; | ||
auto ad = r.direction()[a]; | ||
|
||
auto t_interval = span((ax.min - ao) / ad, (ax.max - ao) / ad); | ||
ray_t = ray_t.intersect(t_interval); | ||
|
||
if (ray_t.is_empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely surprised that changing the min/max calls to ternaries helped but I think this will ultimately depend on the compiler. Both ternaries and min/max intrinsics on floats likely bottom out at similar instructions but I haven't fully analyzed them down to instruction cycle counts.
Does the new code perform better than Andrew Kensler's version? I would be surprised as the new version has more branches and divisions afaict. If we're going down the path of optimizations, perhaps the code should keep the invD
optimization as it is very common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will ultimately depend on the compiler
Not quite. The ternary expression just lets infinities and NaNs flow according to IEEE-754. The fmin
/fmax
C++ library functions intercept NaN values so that the finite/infinite values are returned unless both parameters are NaN. In most cases for our code, this doesn't matter, so the extra (often dramatically slower) work is unnecessary.
I remembered hitting this in geospatial calculations for our Tableau codebase as well, with the preferred solution avoiding the use of these library functions.
And yes, the new code outperforms Andrew Kensler's version.
In additional weirdness, when I make the following changes, I get a 3% slowdown:
for (int a = 0; a < 3; a++) {
const interval& ax = axis(a);
auto ao = r.origin()[a];
- auto ad = r.direction()[a];
+ auto adinv = 1.0 / r.direction()[a];
- auto t_interval = span((ax.min - ao) / ad, (ax.max - ao) / ad);
+ auto t_interval = span((ax.min - ao) * adinv, (ax.max - ao) * adinv);
ray_t = ray_t.intersect(t_interval);
if (ray_t.is_empty())
books/RayTracingTheNextWeek.html
Outdated
the following version of the code. It works extremely well on many compilers, and I have adopted it | ||
as my go-to method: | ||
<div class='together'> | ||
The new code above introduces new interval functions we need to write: `interval::is_empty()`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider rephrasing this as "The new code above relies on three new interval
methods that we haven't defined: ...`.
"relies on" is a little more clear compared to "introduces". I've seen the book use both "method" and "function" interchangeably, which is fine, though method
makes it a bit clearer that you're talking about instance functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased. Note that two of the new functions are class functions, and one is a standalone function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. I missed that span()
is a standalone function. In that case I would just rephrase this like so:
The new code above relies on three new functions that we haven't defined: `interval::is_empty()`, `interval::intersect()`, and `span()`.
books/RayTracingTheNextWeek.html
Outdated
interval(const interval& a, const interval& b) { | ||
// Create the interval tightly enclosing the two input intervals. | ||
min = a.min <= b.min ? a.min : b.min; | ||
max = a.max >= b.max ? a.max : b.max; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this same constructor gets added in the listing below following the sentence First, we'll add a new interval constructor that takes two intervals as input:. I'm not sure if it should be in this listing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
NOTE: This PR is currently under investigation for performance implications across more platforms. |
Quoting an offline thread between me and @armansito on Slack, to preserve comments here for posterity: Arman Uguray
Steve Hollasch
Steve Hollasch
Arman Uguray
Arman Uguray
Arman Uguray
Arman Uguray
Arman Uguray
Arman Uguray
Arman Uguray
Steve Hollasch
Arman Uguray
|
On my system, replacing |
I've run the two versions' debug builds against each other on Windows. I see even worse performance than @armansito did, unfortunately: an almost 38% slowdown with the new code. So, release compiles better or a small bit worse, debug significantly worse, and code that I subjectively feel is more clear. A conundrum. |
OK, that seems to be overall consistent with my measurements. I don't feel great about regressing the performance but I think it makes sense for the book to prioritize clarity. What do you think about changing the basic description of the function to your new interval approach to improve the clarity BUT also keeping Andrew's optimized version in place as an alternative? I don't think it hurts to provide both versions since I think both clearly explaining the concepts and presenting a straightforward optimization are equally valuable in a graphics learning resource. |
Yeah, that's not a bad idea. Still, I'm deep into "sunk cost fallacy" land with all the work I did — it's tempting to proceed, but may not be the wisest choice. I'm still playing with things to see if there's an out. 30% is a high price to pay for "clarity". |
(So gumption trapped. Trying to crawl out of this hole...) 😅 |
Closing this one out for subsequent possible rewrite. |
Revise AABB hit function to use intervals
Increases program run by about 10%, and the code is easier to understand using interval arithmetic. A separate change to ray.origin() and ray.direction() returning const refs adds another 5% speedup.
In addition, I discovered that in many cases using
fmin()
andfmax()
are performance bombs. Likely this is because these functions have special handling for NaNs. Instead, I switched to using ternary expressions likea < b ? a : b
, which had a large performance impact. For theaabb::hit()
function, this improved performance of the new interval code from 100% worse to about 10% better.Added a new
span()
function that returns an interval of two doubles, regardless of their order.Added new
interval::is_empty()
function that also returns true when either of the bounds is a NaN.Added new
interval::intersect()
function.