Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resizing on metal-macos causes metal validation errors when high_dpi is enabled in sokol_app #872

Closed
sconybeare opened this issue Aug 25, 2023 · 8 comments

Comments

@sconybeare
Copy link

Hi, thanks for the excellent graphics library!

I noticed that when using the xcode metal debugging tools, my program was crashing on resize. Here's the most minimal reproduction I could find (basically the triangle example with extra stuff removed and high_dpi=true in sokol_main):

#define SOKOL_IMPL
#define SOKOL_METAL
#include "sokol_app.h"
#include "sokol_gfx.h"
#include "sokol_glue.h"
#include "triangle.glsl.h"

static struct {
  sg_pipeline pip;
  sg_bindings bind;
  sg_pass_action pass_action;
} state;

static void init(void) {
  sg_setup(&(sg_desc) {
    .context = sapp_sgcontext(),
  });

  float vertices[] = {
    0.0f, 0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f,
    0.5f, -0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f,
    -0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f,
  };
  state.bind.vertex_buffers[0] = sg_make_buffer(&(sg_buffer_desc) {
    .data = SG_RANGE(vertices),
    .label = "triangle-vertices",
  });
  sg_shader shd = sg_make_shader(triangle_shader_desc(sg_query_backend()));
  state.pip = sg_make_pipeline(&(sg_pipeline_desc) {
    .shader = shd,
    .layout = {
      .attrs = {
        [ATTR_vs_position].format = SG_VERTEXFORMAT_FLOAT3,
        [ATTR_vs_color0].format = SG_VERTEXFORMAT_FLOAT4,
      },
    },
    .label = "triangle-pipeline",
  });

  state.pass_action = (sg_pass_action) {
    .colors[0] = { .load_action=SG_LOADACTION_CLEAR, .clear_value={ .a = 1.0f }},
    .depth = { .load_action=SG_LOADACTION_CLEAR, },
  };
}

static void frame(void) {
  sg_begin_default_pass(&state.pass_action, sapp_width(), sapp_height());
  sg_apply_pipeline(state.pip);
  sg_apply_bindings(&state.bind);
  sg_draw(0, 3, 1);
  sg_end_pass();
  sg_commit();
}

void cleanup(void) {
  sg_shutdown();
}

sapp_desc sokol_main(int argc, char **argv) {
  return (sapp_desc) {
    .width = 640,
    .height = 480,
    .init_cb = init,
    .frame_cb = frame,
    .cleanup_cb = cleanup,
    .high_dpi = true,
  };
}

with this shader code:

@vs vs
in vec4 position;
in vec4 color0;

out vec4 color;

void main() {
    gl_Position = position;
    color = color0;
}
@end

@fs fs
in vec4 color;
out vec4 frag_color;

void main() {
    frag_color = color;
}
@end
@program triangle vs fs

Running this with MTL_DEBUG_LAYER=1 set and then resizing the window results in a crash with the following text printed out:

2023-08-25 00:39:28.618 triangle[82276:3267875] Metal API Validation Enabled
-[MTLDebugDevice notifyExternalReferencesNonZeroOnDealloc:]:2885: failed assertion `The following Metal object is being destroyed while still required to be alive by the command buffer 0x14f02ce00 (label: <no label set>):
<MTLToolsObject: 0x14d750440> -> <AGXG14XFamilyTexture: 0x14d750960>
    label = MTKView Depth Stencil 
    textureType = MTLTextureType2D 
    pixelFormat = MTLPixelFormatDepth32Float_Stencil8 
    width = 1280 
    height = 960 
    depth = 1 
    arrayLength = 1 
    mipmapLevelCount = 1 
    sampleCount = 1 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModePrivate 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModePrivate MTLResourceHazardTrackingModeTracked  
    usage = MTLTextureUsageRenderTarget 
    shareable = 0 
    framebufferOnly = 0 
    purgeableState = MTLPurgeableStateNonVolatile 
    swizzle = [MTLTextureSwizzleRed, MTLTextureSwizzleGreen, MTLTextureSwizzleBlue, MTLTextureSwizzleAlpha] 
    isCompressed = 1 
    parentTexture = <null> 
    parentRelativeLevel = 0 
    parentRelativeSlice = 0 
    buffer = <null> 
    bufferOffset = 0 
    bufferBytesPerRow = 0 
    iosurface = 0x0 
    iosurfacePlane = 0 
    allowGPUOptimizedContents = YES'
zsh: abort      MTL_DEBUG_LAYER=1 ./triangle

sokol headers used were from sokol commit 47d92ff86298fc96b3b84d93d0ee8c8533d3a2d2, which is the tip of master at the time of posting

@floooh
Copy link
Owner

floooh commented Aug 25, 2023

Hmm, I've seen this exact same problem after updating to macOS 13.x around the end of 2022:

#762

...but this issue was fixed by this MR:

#763

I'll try to reproduce the problem in the evening (it's strange because my Mac is my main development platform, and I should have noticed such an obvious issue).

In the meantime, can you check if the problem still happens when you change the commandBufferWithUnretainedReferences call here:

https://github.com/floooh/sokol/blob/47d92ff86298fc96b3b84d93d0ee8c8533d3a2d2/sokol_gfx.h#L11495C49-L11495C86

...to:

_sg.mtl.cmd_buffer = [_sg.mtl.cmd_queue commandBuffer];

(so that it looks like in the line below where the separate present_cmd_buffer is created.

@floooh
Copy link
Owner

floooh commented Aug 25, 2023

...cannot reproduce on my Mac with some furious resizing btw (so far I just enabled .high_dpi in the triangle sample in sokol-samples), neither when running from the Xcode debugger (this enables the validation layer by default), or running from the command line with MTL_DEBUG_LAYER enabled.

I'm on an M1 14" MBP with macOS Ventura 13.4.1 (c) (22F770820d), Xcode version is 14.2

PS: do you compile with ARC enabled or disabled? (it shouldn't matter but since it is some sort of lifetime issue it could be related)

@floooh
Copy link
Owner

floooh commented Aug 25, 2023

...let me update to the latest xcode version...

@floooh
Copy link
Owner

floooh commented Aug 25, 2023

Ok, I finally got the error too... took a long while of wildly resizing though, and I can't seem to reproduce it reliably.

I also updated to the latest Xcode version, but it's probably not related.

I'll try to investigate later today. I was hoping that just doing the present call in a separate command buffer with retained references would be enough to fix the issue, but looks like that's not enough.

It's also complicated by the fact the resource in question is created and managed by MTKView, which I don't have access to. Worst case would be to do all rendering in a command buffer with retained references, but maybe there's another solution (e.g. maybe I can extract the depth-stencil-surface from the MTLRenderPassDescriptor I'm getting from MTKView, and add a manual retain/release call somewhere in sokol-gfx - but that doesn't sound like a robust solution, I would expect that MTKView takes care of the lifetimes of the objects is manages).

-[MTLDebugDevice notifyExternalReferencesNonZeroOnDealloc:]:2885: failed assertion `The following Metal object is being destroyed while still required to be alive by the command buffer 0x101059800 (label: <no label set>):
<MTLToolsObject: 0x100e49500> -> <AGXG13XFamilyTexture: 0x100e4eab0>
    label = MTKView Depth Stencil 
    textureType = MTLTextureType2D 
    pixelFormat = MTLPixelFormatDepth32Float_Stencil8 
    width = 2892 
    height = 44 
    depth = 1 
    arrayLength = 1 
    mipmapLevelCount = 1 
    sampleCount = 1 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModePrivate 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModePrivate MTLResourceHazardTrackingModeTracked  
    usage = MTLTextureUsageRenderTarget 
    shareable = 0 
    framebufferOnly = 0 
    purgeableState = MTLPurgeableStateNonVolatile 
    swizzle = [MTLTextureSwizzleRed, MTLTextureSwizzleGreen, MTLTextureSwizzleBlue, MTLTextureSwizzleAlpha] 
    isCompressed = 1 
    parentTexture = <null> 
    parentRelativeLevel = 0 
    parentRelativeSlice = 0 
    buffer = <null> 
    bufferOffset = 0 
    bufferBytesPerRow = 0 
    iosurface = 0x0 
    iosurfacePlane = 0 
    allowGPUOptimizedContents = YES'

floooh added a commit that referenced this issue Aug 25, 2023
See #872

This does away with the separate retained-reference command buffer
for the present call, and instead uses the regular 'deferred
release queue' for the default pass MTLRenderPassDescriptor.
Since the render-pass-descriptor holds a strong-ref to
swapchain surfaces this should make sure that those surfaces
outlive their command buffer when they are released by
MTKView because of a window resize. This means there's a memory
spike during window resize, but I guess there's no way around that.
@floooh
Copy link
Owner

floooh commented Aug 25, 2023

Can you give the attempted fix in the branch issue872-metal-validation-error a try? This reverts the previous solution to use a separate command buffer with retained references only for the present call, and instead pins the swapchain resources into memory via the regular 'deferred release queue' which I'm already using for all other Metal resource objects.

See this change: master...issue872-metal-validation-error

This means there's no refcounting overhead (because the command buffer is stil created with unretained-references), but slightly increases memory usage during window resize (but that's expected to happen, because swapchain resources which had been released by MTKView during the window resize need to stick around a few frames before it's safe to delete them).

@sconybeare
Copy link
Author

That seems to fix it for me, too. Thank you!

@floooh
Copy link
Owner

floooh commented Aug 25, 2023

Ok, I think I'll merge the fix into master some time over the weekend (or latest on Monday). Will close this issue when the fix is merged.

@floooh
Copy link
Owner

floooh commented Aug 28, 2023

Closing this because PR #873 has been merged. Thanks for the bug report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants