Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes to winding number accumulation (#391)
* Prototype 8 bit winding number accumulation Changes the winding number accumulation in msaa8 mode to 8 bits per sample. This is prototype code and currently breaks the msaa16 mode; it is intended to diagnose whether the artifacts are strictly due to overflow, and to point the way to a real implementation. Prefix sums in both x and y direction are a little cleaner, avoiding a race (not UB because it's atomics). * Make msaa16 mode use 8 bit accumulation This patch makes the msaa16 mode work again, using 8 bit accumulation of winding numbers. It could be merged to fix the artifacts in the cardioid example. Also, it's worth doing some evaluation to see how much performance slowdown there is. As future work, we probably want to be adaptive and use 8 bit accumulation when needed. If the performance hit from reduced occupancy due to the increased shared memory usage is significant, then we could consider other mitigations, including downgrading to msaa8 when overflow is possible. * Add even-odd fill rule Create a specialized version of the fill function for the even-odd fill rule. The logic is simpler (and faster) because winding number accumulation can happen in one bit. There's a bunch of code duplication which can be cleaned up. It's expected this will have a merge conflict with #382. If that's merged first, I'll happily fix this one. * Prepare for merge * Add a bunch of comments I did my best to document some of the strange bit magic used in the algorithm. I also did just a bit of renaming to make things simpler, and for the mask expansion replaced `|` with `^` because it's easier to understand in terms of carry-less multiplication (and I expect performance to be identical). * Typo
- Loading branch information