-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Clustered forward rendering #3153
Conversation
Yo, what do I need to do to pull this down and try it out? I've got a demo project that supports hundreds of moving lights and objects that's been sorely in need of something like this: |
This is based on the pipelined-rendering branch of the bevy repo. So you would need to get your demo working with that by not using the prelude, using PipelinedDefaultPlugins instead of DefaultPlugins and using things from bevy_render2 / bevy_pbr2 etc instead of the old renderer. Hopefully that’s enough to get you going. If not it may be better to wait until it is merged into main. We’re weeks away from 0.6 now. :) |
Some of the reference material:
|
45696df
to
90557d4
Compare
f07fbe4
to
6cc04d7
Compare
I added these as a comment before the cluster-related code in src/light.rs as I couldn't really think of a better place to do it, but we can of course move any of it anywhere. |
f9e823b
to
0802ff4
Compare
Am I missing something, when given a window description and providing a size, say 500x500, you get lots of artifacts and Can reproduce in |
You're probably not missing anything, just I hadn't found a situation where that happened outside of stress tests. My guess would be that the light sources are close to the camera, and so because there are a lot of clusters close to the camera, the lights intersect many of them, and we run out of space. This is something that speaks to using a texture for the cluster light lists instead of trying to squeeze them into a 16kB uniform. However, a simpler solution that was used in one of the reference materials was to just only have one depth slice for the first 5 units along z, and then use the 'real' exponential mapping after that. I should probably just do that. I didn't because I wanted a request for N depth slices to actually produce N depth slices, rather than N - however many depth slices are in the first 5 units + 1, and I didn't find a good mathematical way of doing that. I think that was me being a bit technically pure with the implementation and the pragmatic approach is to do exactly that - make the first slice a fixed size. I'll do that as soon as I can, probably by dynamically calculating the first depth boundary >= M units along z, and using that as the far boundary of the first slice. |
Sorry mate, you're so far ahead of me on this that I hate to even take up your time. I should have been more clear, added pic to show the result, maybe I used the term artifact incorrectly. only change was adding
|
Yup, I could reproduce it when adding the window size as 500x500. Thanks for finding this issue! I’ll address it as soon as I have time to. |
Ok after reading your first response a few more times I get it, thanks for all you do, gl |
I was wrong in my initial response. This was actually about the way the clusters are configured. There can be maximum 4096 clusters, and I am using a constant 24 depth slices, which leaves the condition: |
It should not be disabled by default - oops! Also put it inside a shader def as it doesn't need to be there unless debugging.
Co-authored-by: Jakob Hellermann <[email protected]>
Co-authored-by: Jakob Hellermann <[email protected]>
When using a square aspect ratio, tile sizes were calculated such that the maximum number of clusters was exhausted. This new approach is aspect-ratio independent.
8a4a2a2
to
cdeb00b
Compare
Alrighty I've wrapped up my final pass on this. I think its in a good spot to merge. I do want to eventually revisit putting the Clusters calculation in the "user app" instead of in the render app, as that feels like a "renderer implementation detail". But I also agree that it would involve rethinking visibility a bit. Time to move forward! |
bors r+ |
# Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
# Objective We keep getting issues where things break at small window sizes, e.g #3368 (caused by #3153), #3596 ('caused' by #3545) ## Solution - Add a test that we can make small windows. Currently, this fails on my machine with some quite scary vulkan errors: ``` 2022-01-08T22:55:13.770261Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,60), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,56), minImageExtent = (225,56), maxImageExtent = (225,56). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) 2022-01-08T22:55:13.770808Z ERROR wgpu_hal::vulkan::instance: objects: (type: DEVICE, hndl: 0x1adbd410a60, name: ?) 2022-01-08T22:55:13.787403Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,56), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,52), minImageExtent = (225,52), maxImageExtent = (225,52). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) ``` etc. This might be a new issue here, although I'm surprised it's vulkan giving this error; wgpu should stop it if this is illegal.
# Objective We keep getting issues where things break at small window sizes, e.g #3368 (caused by #3153), #3596 ('caused' by #3545) ## Solution - Add a test that we can make small windows. Currently, this fails on my machine with some quite scary vulkan errors: ``` 2022-01-08T22:55:13.770261Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,60), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,56), minImageExtent = (225,56), maxImageExtent = (225,56). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) 2022-01-08T22:55:13.770808Z ERROR wgpu_hal::vulkan::instance: objects: (type: DEVICE, hndl: 0x1adbd410a60, name: ?) 2022-01-08T22:55:13.787403Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,56), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,52), minImageExtent = (225,52), maxImageExtent = (225,52). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) ``` etc. This might be a new issue here, although I'm surprised it's vulkan giving this error; wgpu should stop it if this is illegal.
# Objective We keep getting issues where things break at small window sizes, e.g #3368 (caused by #3153), #3596 ('caused' by #3545) ## Solution - Add a test that we can make small windows. Currently, this fails on my machine with some quite scary vulkan errors: ``` 2022-01-08T22:55:13.770261Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,60), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,56), minImageExtent = (225,56), maxImageExtent = (225,56). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) 2022-01-08T22:55:13.770808Z ERROR wgpu_hal::vulkan::instance: objects: (type: DEVICE, hndl: 0x1adbd410a60, name: ?) 2022-01-08T22:55:13.787403Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,56), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,52), minImageExtent = (225,52), maxImageExtent = (225,52). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) ``` etc. This might be a new issue here, although I'm surprised it's vulkan giving this error; wgpu should stop it if this is illegal.
# Objective We keep getting issues where things break at small window sizes, e.g #3368 (caused by #3153), #3596 ('caused' by #3545) ## Solution - Add a test that we can make small windows. Currently, this fails on my machine with some quite scary vulkan errors: ``` 2022-01-08T22:55:13.770261Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,60), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,56), minImageExtent = (225,56), maxImageExtent = (225,56). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) 2022-01-08T22:55:13.770808Z ERROR wgpu_hal::vulkan::instance: objects: (type: DEVICE, hndl: 0x1adbd410a60, name: ?) 2022-01-08T22:55:13.787403Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,56), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,52), minImageExtent = (225,52), maxImageExtent = (225,52). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) ``` etc. This might be a new issue here, although I'm surprised it's vulkan giving this error; wgpu should stop it if this is illegal.
# Objective We keep getting issues where things break at small window sizes, e.g bevyengine#3368 (caused by bevyengine#3153), bevyengine#3596 ('caused' by bevyengine#3545) ## Solution - Add a test that we can make small windows. Currently, this fails on my machine with some quite scary vulkan errors: ``` 2022-01-08T22:55:13.770261Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,60), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,56), minImageExtent = (225,56), maxImageExtent = (225,56). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) 2022-01-08T22:55:13.770808Z ERROR wgpu_hal::vulkan::instance: objects: (type: DEVICE, hndl: 0x1adbd410a60, name: ?) 2022-01-08T22:55:13.787403Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,56), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,52), minImageExtent = (225,52), maxImageExtent = (225,52). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) ``` etc. This might be a new issue here, although I'm surprised it's vulkan giving this error; wgpu should stop it if this is illegal.
# Objective We keep getting issues where things break at small window sizes, e.g bevyengine#3368 (caused by bevyengine#3153), bevyengine#3596 ('caused' by bevyengine#3545) ## Solution - Add a test that we can make small windows. Currently, this fails on my machine with some quite scary vulkan errors: ``` 2022-01-08T22:55:13.770261Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,60), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,56), minImageExtent = (225,56), maxImageExtent = (225,56). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) 2022-01-08T22:55:13.770808Z ERROR wgpu_hal::vulkan::instance: objects: (type: DEVICE, hndl: 0x1adbd410a60, name: ?) 2022-01-08T22:55:13.787403Z ERROR wgpu_hal::vulkan::instance: VALIDATION [VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 (0x7cd0911d)] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-imageExtent-01274 ] Object 0: handle = 0x1adbd410a60, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x7cd0911d | vkCreateSwapchainKHR() called with imageExtent = (225,56), which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR(): currentExtent = (225,52), minImageExtent = (225,52), maxImageExtent = (225,52). The Vulkan spec states: imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-VkSwapchainCreateInfoKHR-imageExtent-01274) ``` etc. This might be a new issue here, although I'm surprised it's vulkan giving this error; wgpu should stop it if this is illegal.
Objective
Implement clustered-forward rendering.
Solution
FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
I will also probably add an example that demonstrates some of the issues: