Fix GPUParticles are not rendered for older AMD GPUs with OpenGL+Angle #96413

Maran23 · 2024-08-31T22:14:23Z

Using the algorithm (bottom of the paper) as proposed by @clayjohn: https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats

Made sure that my fix does not cause any regressions (see project below):

This should be reviewed by someone much more familiar than me.

GPUParticles now work fine for older AMD GPUs as specified in the issue, as well as e.g. my much newer GPU, that did not suffer from this problem (works before, works after).
I plan to test this on other GPUs as well soon.

cc @Alex2782 - testing on an Adreno 3xx device would be nice as well :)
cc @JonqsGames - since you did the initial change that caused this 'regression' in #72914
cc @clayjohn - as you wrote and reviewed most of the OpenGL stuff :D

Project for testing, that contains all issues we once had regarding the conversion:
ParticleBug.zip

Everything is named like to GH issue and it is 'linked' in the description so you can check everything fast.
| | |

Tested project above on:

AMD Ryzen 5 4500U with integrated AMD Radeon RX Vega 6 (broken before)
Intel(R) Core(TM) i5-4670 CPU with dedicated AMD Radeon RX 580 Series (broken before)
AMD Ryzen 9 5900X with dedicated AMD Radeon RX 6900 XT
AMD Ryzen 7 5800X3D with dedicated NVIDIA GeForce RTX 4070 Ti

Alex2782 · 2024-09-01T01:04:59Z

OK on LG Nexus 7 - Adreno 320 (MRP #88815 and "2D Platformer Demo")

// Compatibility renames. These are exposed with the "godot_" prefix
// to work around two distinct Adreno bugs:

I have commented out all “renames”, 2D Platformer also looked normal, should other MRP be tested?

/*
#define packUnorm4x8 godot_packUnorm4x8
#define unpackUnorm4x8 godot_unpackUnorm4x8
#define packSnorm4x8 godot_packSnorm4x8
#define unpackSnorm4x8 godot_unpackSnorm4x8
#define packHalf2x16 godot_packHalf2x16
#define unpackHalf2x16 godot_unpackHalf2x16
#define packUnorm2x16 godot_packUnorm2x16
#define unpackUnorm2x16 godot_unpackUnorm2x16
#define packSnorm2x16 godot_packSnorm2x16
#define unpackSnorm2x16 godot_unpackSnorm2x16
*/

Maran23 · 2024-09-01T02:40:47Z

I have commented out all “renames”, 2D Platformer also looked normal, should other MRP be tested?

Particles and Shader are most important here.

Maybe your Adreno device is fine? I'm not really into the 'Adreno problems', but checking the git history:

Expose emulated *Unorm4x8 glsl functions in non-android builds #69521 Added some redefines of the un/pack method due to wrong Adreno exposure of ES310 methods
Fix CPU/GPUParticles2D bugs on Compatibility Rendering (GLES3) on Adreno 3XX devices. #88816 Added some redefines of the un/pack method due to buggy Adreno implementations of that methods

If you are lucky, you do not suffer from any of those problems of that wrong exposure or buggy implementation of the un/pack methods.
In any case, if everything still works (completely the same WITH and WITHOUT the un/pack methods), we are actually very good to go here!
A device with the correct implementation of the un/pack methods (without redefines) should behave the same as with our redefined un/pack methods.

clayjohn · 2024-09-01T03:38:11Z

Hmmm, this section of code created a lot of problems for us (as you can see in the history). The old if (h_e == uint(0x0000)) { actually was broken on certain AMD devices (#73320). It was fixed by making the code branchless in #73332. I suspect that this change will reintroduce #73320.

We should introduce the same logic in a branchless manner like in #73332

Or perhaps we should do something more robust and use the "ultrafast method" from https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats (described here: https://stackoverflow.com/a/60047308)

clayjohn · 2024-09-01T04:11:43Z

Given that #95797 only happens with ANGLE, I suspect it is more likely a precision issue rather than a problem with the actual conversion. Its likely that inserting the branch just forces different codegen by the driver.

I would see if using high precision helps before committing to changing the conversion code.

Maran23 · 2024-09-01T14:36:53Z

The old if (h_e == uint(0x0000)) { actually was broken on certain AMD devices (#73320).

Funny enough, the lines did not appear still on all the older AMD devices I tested (with ANGLE ofc). And also fixed by this PR.
I will check the "ultrafast method" (which is actually very similar to what we do already).

I would see if using high precision helps before committing to changing the conversion code.

Is this something you will check? I can help with testing if needed!

clayjohn · 2024-09-01T18:05:19Z

Is this something you will check? I can help with testing if needed!

I don't have a device that can reproduce your issue, so I can't really check.

I am just skeptical about this solution as it removes the check for 0 exponent and replaces it with a check for infinity. Both might be needed, but its clear the root of the problem is bad code gen by the driver. Since OpenGL works fine on the same hardware I suspect the issue comes from the different between OpenGL and OpenGL ES. The most likely culprit is floating point precision as desktop GL drivers almost always ignore precision.

Maran23 · 2024-09-01T18:10:24Z

The most likely culprit is floating point precision as desktop GL drivers almost always ignore precision.

Is there something I can try here?

Currently trying out the fast algorihm. Also attached a project that contains nodes for all issues we had in the past for fast checking. Will also keep it up-to-date if there is more.

Maran23 · 2024-09-01T18:20:47Z

Tested the new fast algorithm solution and it works! Works for my system where everything was good before, and also worked for the broken system. Can check with other GPUs in the next days as well (e.g. Intel and Nvidia). Adreno device would be nice as well.

Checked that the following issues do not reappear (as in the project attached on this PR):

Using a better and faster algorithm for the float conversions

clayjohn

Oh well. This approach is a few more instructions than we had before which is not ideal, but it is more correct and handles more edge cases.

I looked into my precision theory deeper and I am not sure, in theory all unspecified floats and ints should be high precision. So it may be something else in the driver code gen causing the issue.

Since you have confirmed that this works using many of the prior bug reports I think it makes sense to go with this.

The final thing is the licensing issue. There were concerns expressed about the licence previously. Since the code is basically a one-liner, I don't think a licence can be attached to it. The FSF considers any chunk of code fewer than 15 lines to be too trivial to copyright.

akien-mga · 2024-09-02T10:32:50Z

Thanks!

Maran23 requested a review from a team as a code owner August 31, 2024 22:14

Maran23 force-pushed the gpuparticles-amd-fix branch from a7823e0 to 6e6c3f7 Compare August 31, 2024 22:18

AThousandShips added bug topic:rendering labels Sep 1, 2024

AThousandShips added this to the 4.4 milestone Sep 1, 2024

AThousandShips added the cherrypick:4.3 Considered for cherry-picking into a future 4.3.x release label Sep 1, 2024

Maran23 force-pushed the gpuparticles-amd-fix branch from 6e6c3f7 to 86d9f05 Compare September 1, 2024 18:02

Fix GPUParticles are not rendered for older AMD GPUs with OpenGL+Angle

9cc9df5

Using a better and faster algorithm for the float conversions

Maran23 force-pushed the gpuparticles-amd-fix branch from 86d9f05 to 9cc9df5 Compare September 1, 2024 18:22

clayjohn approved these changes Sep 2, 2024

View reviewed changes

akien-mga merged commit 4e5dd4f into godotengine:master Sep 2, 2024
19 checks passed

clayjohn mentioned this pull request Sep 13, 2024

Editor is blinking with strange color lines in Compatibility mode #96943

Closed

matheusmdx mentioned this pull request Oct 17, 2024

OpenGL: Depth fog renders incorrectly in 4.4.dev #97875

Closed

clayjohn removed the cherrypick:4.3 Considered for cherry-picking into a future 4.3.x release label Oct 18, 2024

clayjohn mentioned this pull request Oct 18, 2024

Revert "Fix GPUParticles are not rendered for older AMD GPUs with OpenGL+Angle" #98327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPUParticles are not rendered for older AMD GPUs with OpenGL+Angle #96413

Fix GPUParticles are not rendered for older AMD GPUs with OpenGL+Angle #96413

Maran23 commented Aug 31, 2024 •

edited

Loading

Alex2782 commented Sep 1, 2024 •

edited

Loading

Maran23 commented Sep 1, 2024

clayjohn commented Sep 1, 2024

clayjohn commented Sep 1, 2024

Maran23 commented Sep 1, 2024 •

edited

Loading

clayjohn commented Sep 1, 2024

Maran23 commented Sep 1, 2024

Maran23 commented Sep 1, 2024

clayjohn left a comment

akien-mga commented Sep 2, 2024

Fix GPUParticles are not rendered for older AMD GPUs with OpenGL+Angle #96413

Fix GPUParticles are not rendered for older AMD GPUs with OpenGL+Angle #96413

Conversation

Maran23 commented Aug 31, 2024 • edited Loading

Alex2782 commented Sep 1, 2024 • edited Loading

Maran23 commented Sep 1, 2024

clayjohn commented Sep 1, 2024

clayjohn commented Sep 1, 2024

Maran23 commented Sep 1, 2024 • edited Loading

clayjohn commented Sep 1, 2024

Maran23 commented Sep 1, 2024

Maran23 commented Sep 1, 2024

clayjohn left a comment

Choose a reason for hiding this comment

akien-mga commented Sep 2, 2024

Maran23 commented Aug 31, 2024 •

edited

Loading

Alex2782 commented Sep 1, 2024 •

edited

Loading

Maran23 commented Sep 1, 2024 •

edited

Loading