Performance ideas / benchmarking #3

hybridherbst · 2023-10-10T13:20:14Z

Some ideas in regards to performance. Ultimately it would be nice to get (a subset of) this to work on a Quest 2 / 3; currently that's running at 5-10 fps and very choppy. So are the other three.js implementations though!

using compressed data at runtime. Seems you started on this already! There are some ideas regarding compression formats here: https://aras-p.info/blog/2023/09/27/Making-Gaussian-Splats-more-smaller/
using better interleaved data so data fetching on the GPU is more localized (same link above has some info)
using alpha hashing instead of transparency, and then rendering back-to-front instead to get some early Z cutoff
some kind of LOD system - not sure if splats could be sorted by "importance" (e.g. less transparent ones are more important?) at runtime, or if the calculations would need to be done with less splats in the first place.

Regarding loading behaviour, I've dabbled a bit with creating splats already while loading, will see if I can make a PR for parts of that.

And it would be interesting to load compressed data, again Aras (link above) has some ideas around that and tooling to generate byte buffers that are already optimized (10-20x size reduction).

quadjr · 2023-10-10T15:37:44Z

Thank you for sharing interesting information. Also, thanks for the pull requests. I'll check them later.

I bought the Quest 3 today and made code modifications to support VR mode. It is very intriguing.
The performance still needs improvement, though. I'll look into the information you provided and consider about it.

electrum-bowie · 2023-10-10T20:54:18Z

Please pleaseee let me know too !

quadjr · 2023-10-11T12:00:24Z

using compressed data at runtime. Seems you started on this already! There are some ideas regarding compression formats here: https://aras-p.info/blog/2023/09/27/Making-Gaussian-Splats-more-smaller/

I've read Aras's impressive work!
In my current implementation, each splat uses 256 bits. This means 7.8x size reduction. I haven't evaluated the image quality yet, but I may be able to implement some of Aras's methods.

using alpha hashing instead of transparency, and then rendering back-to-front instead to get some early Z cutoff

Could you elaborate on this idea?

quadjr · 2023-10-11T23:15:46Z

I’ve studied alpha hashing. I’ll test it later. 🤓

quadjr · 2023-10-12T14:01:46Z

I've tested alpha hashing, but it didn't improve the performance
I think it won't reduce memory traffic because continuous pixel values are read in one operation.
Thus, discarding pixels on an individual basis won't impact memory traffic
Here is the code I tested.
https://github.com/quadjr/aframe-gaussian-splatting/tree/feature/alpha-hashing

electrum-bowie · 2023-10-12T17:29:53Z

@quadjr the alpha-hashing branch is 100% identical to the main branch

quadjr · 2023-10-14T07:35:49Z

@electrum-bowie
Sorry, I pushed the code.
https://github.com/quadjr/aframe-gaussian-splatting/tree/feature/alpha-hashing

quadjr · 2023-10-14T07:51:44Z

@hybridherbst
I've made several improvements based on your ideas.

For the LOD system, small splats with high transparency at a distance will be removed during the sorting process.
This method has significantly improved performance.
The threshold for removal requires further theoretical consideration.

Data compression might enhance performance. I need to set up image quality evaluation programs.
Alpha hashing and data localization might not boost the performance.

I've also implemented incremental loading.

I've done almost everything I can think for now. I'll shift my focus to the generation software.
I believe I can make further improvements to it. 🤓

hybridherbst · 2023-10-15T18:18:17Z

Thank you, that does sound like great improvements!

The current threshold of -0.001 did have a very noticeable quality impact on my "FH Portrait" dataset though; I've set it to -0.0001 as a quick test which looks fine, but haven't looked for a proper upper bound. I'll do some more testing with your updates.
EDIT: On Quest -0.001 looks fine actually, so the number may need to be fov-based.

One question out of curiosity, the sortSplats method currently allocates new arrays on each run – doesn't that have a performance impact and/or would it be better to cache those instead?

quadjr · 2023-10-16T11:40:24Z

Thank you for the reports.
The threshold should be determined by the size of the splat on the screen, and it can be calculated using FOV and resolution.
I will work on implementing the threshold calculation later.

One question out of curiosity, the sortSplats method currently allocates new arrays on each run – doesn't that have a performance impact and/or would it be better to cache those instead?

Yeah, There are some unnecessary allocations during the loading and sorting processes.
I'm currently focused on the generation side of the model and am prioritizing that.
Once I've addressed that, I will optimize the memory usage and allocations of this viewer.

JiamingSuen · 2023-10-20T16:47:10Z

Maybe consider integrating https://github.com/mkkellogg/GaussianSplats3D, which uses a wasm module for sorting. The author has also done some other interesting optimizations.

softyoda · 2023-11-05T12:41:14Z

Hi, will the mkkellogg .splat format (that add further optimization) mkkellogg/GaussianSplats3D#28 will be compatible with the .splat of this implementation?

dlazares · 2023-11-29T19:29:52Z

@quadjr @hybridherbst
We've already done the work on making splats smaller!

I made this repo to share our small splats for renderer testing.
we have these running in our renderer at 90FPS on Quest 3 in browser. I'd love to help out so we can get something more sharable.
https://github.com/gmix-tech/small_splats

I tested this branch out with our small splat and I'm only pulling 45 FPS from AFrame in VR mode on Quest 3. It's unclear to me whether it's something to do with AFrame itself or with this component implementation.

Feel free to ping me at [email protected] if you wanna chat more about this

dmarcos · 2024-03-08T05:49:35Z

@dlazares 90fps sounds great! How can I give your renderer a try? Couldn't find the repo. I'm working on integrating a component on A-Frame core. Thanks so much

quadjr self-assigned this Oct 10, 2023

quadjr added the enhancement New feature or request label Oct 10, 2023

quadjr mentioned this issue Oct 12, 2023

2 fps on Quest 2 #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance ideas / benchmarking #3

Performance ideas / benchmarking #3

hybridherbst commented Oct 10, 2023

quadjr commented Oct 10, 2023 •

edited

Loading

electrum-bowie commented Oct 10, 2023

quadjr commented Oct 11, 2023 •

edited

Loading

quadjr commented Oct 11, 2023

quadjr commented Oct 12, 2023

electrum-bowie commented Oct 12, 2023

quadjr commented Oct 14, 2023

quadjr commented Oct 14, 2023

hybridherbst commented Oct 15, 2023 •

edited

Loading

quadjr commented Oct 16, 2023

JiamingSuen commented Oct 20, 2023

softyoda commented Nov 5, 2023

dlazares commented Nov 29, 2023 •

edited

Loading

dmarcos commented Mar 8, 2024

Performance ideas / benchmarking #3

Performance ideas / benchmarking #3

Comments

hybridherbst commented Oct 10, 2023

quadjr commented Oct 10, 2023 • edited Loading

electrum-bowie commented Oct 10, 2023

quadjr commented Oct 11, 2023 • edited Loading

quadjr commented Oct 11, 2023

quadjr commented Oct 12, 2023

electrum-bowie commented Oct 12, 2023

quadjr commented Oct 14, 2023

quadjr commented Oct 14, 2023

hybridherbst commented Oct 15, 2023 • edited Loading

quadjr commented Oct 16, 2023

JiamingSuen commented Oct 20, 2023

softyoda commented Nov 5, 2023

dlazares commented Nov 29, 2023 • edited Loading

dmarcos commented Mar 8, 2024

quadjr commented Oct 10, 2023 •

edited

Loading

quadjr commented Oct 11, 2023 •

edited

Loading

hybridherbst commented Oct 15, 2023 •

edited

Loading

dlazares commented Nov 29, 2023 •

edited

Loading