Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barriers after return in some compute shaders #94820

Open
clayjohn opened this issue Jul 26, 2024 · 1 comment
Open

Barriers after return in some compute shaders #94820

clayjohn opened this issue Jul 26, 2024 · 1 comment

Comments

@clayjohn
Copy link
Member

Tested versions

All released version of 4.x including the soon to be released 4.3

System information

All systems

Issue description

In many of our compute shaders we utilize memory barriers to synchronize threads before proceeding. Barriers must be uniform in terms of control flow, which means that they cannot be used in situations where one thread in a group may not reach them.

A common way where one thread may not reach a barrier in a group is if the barrier is behind an if statement and the condition is non-uniform. Another common way is if the shader returns before the barrier conditionally (non-uniform).

The latter is a common pattern in Godot. We do an early return for threads which won't contribute to the output of the shader.

The consequence of returning non-uniformly is that it is possible for the shader to crash, fail to run, or hang indefinitely. We first encountered this problem while working on HDDAGI and testing it on Intel devices.

This problem is likely responsible for a lot of the "Device Lost" errors that we get and should be remedied as soon as possible.

The solution is to move the non-uniform control flow statements to after all the barriers have been executed. For example, for the jumpflood stage of SDFGI, we could take the return here:

if (any(lessThan(global_pos, ivec3(0))) || any(greaterThanEqual(global_pos, ivec3(params.grid_size)))) {
return; //do nothing else, end here because outside range
}
//sync
groupMemoryBarrier();
barrier();

And simply move it to be after the barrier.

Steps to reproduce

Run Godot with one of the following effects:

  1. Auto Exposure
  2. SDFGI
  3. TAA
  4. Glow
  5. VoxelGI
  6. SSAO, SSR, SSIL, DoF
  7. GPUParticles

Minimal reproduction project (MRP)

The problem is internal to Godot

@clayjohn clayjohn added this to the 4.x milestone Jul 26, 2024
@clayjohn
Copy link
Member Author

The places with potential issues right now are:

  • luminance_reduce.glsl
  • sdfgi_preprocess.glsl
  • taa_resolve.glsl
  • copy.glsl
  • ss_effects_downsample.glsl
  • sort.glsl

luminance_reduce.glsl

Seems okay, control flow appears to be uniform

sdfgi_preprocess

Will be fixed by #86267, but might be worth fixing now anyway

MODE_JUMPFLOOD_OPTIMIZED

if (any(lessThan(global_pos, ivec3(0))) || any(greaterThanEqual(global_pos, ivec3(params.grid_size)))) {
return; //do nothing else, end here because outside range
}
//sync
groupMemoryBarrier();
barrier();

Need to move the branch below the barrier

MODE_OCCLUSION

if ((uvec3(lessThanEqual(region_offset_to, scroll_from)) | uvec3(greaterThanEqual(region_offset, scroll_to))) * scroll_mask == scroll_mask) { //all axes that scroll are out, exit
return; //region outside scroll bounds, quit
}
}

Need to move branch below the barriers. Last barrier here:

groupMemoryBarrier();
barrier();
for (int i = 0; i < 64; i++) {

taa_resolve.glsl

Seems okay, control flow is after all barriers

copy.glsl

Seems okay, the only non-uniform control flow happens after the barriers (i.e. inside the MODE_GAUSSIAN_BLUR branch)

ss_effects_downsample.glsl

Seems okay, Looks like there is no control flow. This should be checked to ensure there is no out of bounds memory access.

sort.glsl

Control flow appears to be uniform

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Up for grabs
Development

No branches or pull requests

1 participant