Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPUParticles are not rendered for older AMD GPUs with OpenGL+Angle #96413

Merged
merged 1 commit into from
Sep 2, 2024

Conversation

Maran23
Copy link
Contributor

@Maran23 Maran23 commented Aug 31, 2024

Fixes: #95797

Using the algorithm (bottom of the paper) as proposed by @clayjohn: https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats

Made sure that my fix does not cause any regressions (see project below):

This should be reviewed by someone much more familiar than me.

GPUParticles now work fine for older AMD GPUs as specified in the issue, as well as e.g. my much newer GPU, that did not suffer from this problem (works before, works after).
I plan to test this on other GPUs as well soon.


cc @Alex2782 - testing on an Adreno 3xx device would be nice as well :)
cc @JonqsGames - since you did the initial change that caused this 'regression' in #72914
cc @clayjohn - as you wrote and reviewed most of the OpenGL stuff :D


Project for testing, that contains all issues we once had regarding the conversion:
ParticleBug.zip

Everything is named like to GH issue and it is 'linked' in the description so you can check everything fast.
| image | image |


Tested project above on:

  • AMD Ryzen 5 4500U with integrated AMD Radeon RX Vega 6 (broken before)
  • Intel(R) Core(TM) i5-4670 CPU with dedicated AMD Radeon RX 580 Series (broken before)
  • AMD Ryzen 9 5900X with dedicated AMD Radeon RX 6900 XT
  • AMD Ryzen 7 5800X3D with dedicated NVIDIA GeForce RTX 4070 Ti

@Maran23 Maran23 requested a review from a team as a code owner August 31, 2024 22:14
@Alex2782
Copy link
Contributor

Alex2782 commented Sep 1, 2024

OK on LG Nexus 7 - Adreno 320 (MRP #88815 and "2D Platformer Demo")


// Compatibility renames. These are exposed with the "godot_" prefix
// to work around two distinct Adreno bugs:

I have commented out all “renames”, 2D Platformer also looked normal, should other MRP be tested?

/*
#define packUnorm4x8 godot_packUnorm4x8
#define unpackUnorm4x8 godot_unpackUnorm4x8
#define packSnorm4x8 godot_packSnorm4x8
#define unpackSnorm4x8 godot_unpackSnorm4x8
#define packHalf2x16 godot_packHalf2x16
#define unpackHalf2x16 godot_unpackHalf2x16
#define packUnorm2x16 godot_packUnorm2x16
#define unpackUnorm2x16 godot_unpackUnorm2x16
#define packSnorm2x16 godot_packSnorm2x16
#define unpackSnorm2x16 godot_unpackSnorm2x16
*/

@Maran23
Copy link
Contributor Author

Maran23 commented Sep 1, 2024

I have commented out all “renames”, 2D Platformer also looked normal, should other MRP be tested?

Particles and Shader are most important here.

Maybe your Adreno device is fine? I'm not really into the 'Adreno problems', but checking the git history:

If you are lucky, you do not suffer from any of those problems of that wrong exposure or buggy implementation of the un/pack methods.
In any case, if everything still works (completely the same WITH and WITHOUT the un/pack methods), we are actually very good to go here!
A device with the correct implementation of the un/pack methods (without redefines) should behave the same as with our redefined un/pack methods.

@clayjohn
Copy link
Member

clayjohn commented Sep 1, 2024

Hmmm, this section of code created a lot of problems for us (as you can see in the history). The old if (h_e == uint(0x0000)) { actually was broken on certain AMD devices (#73320). It was fixed by making the code branchless in #73332. I suspect that this change will reintroduce #73320.

We should introduce the same logic in a branchless manner like in #73332

Or perhaps we should do something more robust and use the "ultrafast method" from https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats (described here: https://stackoverflow.com/a/60047308)

@clayjohn
Copy link
Member

clayjohn commented Sep 1, 2024

Given that #95797 only happens with ANGLE, I suspect it is more likely a precision issue rather than a problem with the actual conversion. Its likely that inserting the branch just forces different codegen by the driver.

I would see if using high precision helps before committing to changing the conversion code.

@AThousandShips AThousandShips added this to the 4.4 milestone Sep 1, 2024
@AThousandShips AThousandShips added the cherrypick:4.3 Considered for cherry-picking into a future 4.3.x release label Sep 1, 2024
@Maran23
Copy link
Contributor Author

Maran23 commented Sep 1, 2024

The old if (h_e == uint(0x0000)) { actually was broken on certain AMD devices (#73320).

Funny enough, the lines did not appear still on all the older AMD devices I tested (with ANGLE ofc). And also fixed by this PR.
I will check the "ultrafast method" (which is actually very similar to what we do already).

I would see if using high precision helps before committing to changing the conversion code.

Is this something you will check? I can help with testing if needed!

@clayjohn
Copy link
Member

clayjohn commented Sep 1, 2024

Is this something you will check? I can help with testing if needed!

I don't have a device that can reproduce your issue, so I can't really check.

I am just skeptical about this solution as it removes the check for 0 exponent and replaces it with a check for infinity. Both might be needed, but its clear the root of the problem is bad code gen by the driver. Since OpenGL works fine on the same hardware I suspect the issue comes from the different between OpenGL and OpenGL ES. The most likely culprit is floating point precision as desktop GL drivers almost always ignore precision.

@Maran23
Copy link
Contributor Author

Maran23 commented Sep 1, 2024

The most likely culprit is floating point precision as desktop GL drivers almost always ignore precision.

Is there something I can try here?

Currently trying out the fast algorihm. Also attached a project that contains nodes for all issues we had in the past for fast checking. Will also keep it up-to-date if there is more.

@Maran23
Copy link
Contributor Author

Maran23 commented Sep 1, 2024

Tested the new fast algorithm solution and it works! Works for my system where everything was good before, and also worked for the broken system. Can check with other GPUs in the next days as well (e.g. Intel and Nvidia). Adreno device would be nice as well.

Checked that the following issues do not reappear (as in the project attached on this PR):

Using a better and faster algorithm for the float conversions
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh well. This approach is a few more instructions than we had before which is not ideal, but it is more correct and handles more edge cases.

I looked into my precision theory deeper and I am not sure, in theory all unspecified floats and ints should be high precision. So it may be something else in the driver code gen causing the issue.

Since you have confirmed that this works using many of the prior bug reports I think it makes sense to go with this.

The final thing is the licensing issue. There were concerns expressed about the licence previously. Since the code is basically a one-liner, I don't think a licence can be attached to it. The FSF considers any chunk of code fewer than 15 lines to be too trivial to copyright.

@akien-mga akien-mga merged commit 4e5dd4f into godotengine:master Sep 2, 2024
19 checks passed
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPUParticles not rendered in Web Export or Windows ANGLE with AMD Radeon RX Vega 6-7 GPU
5 participants