-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early frees on CPU Implementations #3193
Comments
This might end up having the same fix as #3160 (which is also easier to reproduce). |
This is reproing in CI now on linux: https://github.com/gfx-rs/wgpu/actions/runs/5173769981/jobs/9319358001?pr=3830 |
The |
This is unfortunately still reproducing on CI (see #4728) |
Confirmed that this is still happening. This happens extremely often with Ruffle and it's starting to make our visual tests quite flaky |
Now that the arcanization dust has settled, we should be able to properly investigate this. |
It seems like CI for #5222 hits this reliably in the |
Some background. This is not a coherent explanation of anything, just me writing down what seemed possibly relevant: Direct3D 12's complaint is that users are not permitted to call In the
|
So I think all that's necessary for this bug to occur is for |
I found a workaround for #5222, but https://github.com/jimblandy/wgpu/tree/repro-wgpu-3193 has the code that was crashing on CI. |
Confirmed that that branch still crashes. That suggests that this change removed the behavior that triggers the bug. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@jimblandy: Similar to what's noted in the large repro case in the OP, the major difference in that change is the addition of a call to |
Investigating now. I suspect this has to do with the order in which we attempt to free resources. I have a WIP PR that seems to fix the issue by forcing a command encoder to be discarded when
|
N.B. that we already have a repro. case in the form of the |
Looks like there's still another cause somewhere. This is on 0.19.3:
|
Perhaps we're not correctly working with the lifetime of discarding command encodings yet WRT fences, then? At least for the DX12 backend, the error message makes sense to me, with that hypothesis: we're not sync'ing on a D3D12 fence to guarantee that backend work is actually finished before we try to reset. MSDN docs. that seem noteworthily relevant: The example code in the overview seems particularly interesting. FTR: I've also seen this issue in Firefox on DX12 testing for CTS, though I don't have a log link handy, ATM. |
Unassigning from myself to reflect that I'm not giving this active attention, ATM. |
I'm somewhat consistently seeing this when running this test on a windows laptop:
Notably, the test takes ~70 seconds before it fails. edit: after I posted this it stopped consistently failing 😭 |
I think #5251 resolved the For the remaining
These all point to improper recycling of encoders in pending writes which I guess I recently fixed in 61739d9 (#5910). At the time I wasn't aware that this was an invariant of encoders but it makes sense in retrospect. @Dinnerbone @torokati44 @Imberflur could you try trunk or any commit after the one referenced above to see if this issue was resolved? |
I'm seeing a different error at the moment:
https://github.com/torokati44/ruffle/actions/runs/9959259396/job/27515702109#step:8:4168 EDIT: But the ones we used to see this error on ( |
Would you look at that, it actually passed! 🥳 |
I think we can finally close this then! |
yay! patch release when? 😛 |
Next release is scheduled for tomorrow actually, I don't think we can easily do a patch release since there have been a bunch of refactors to that area of the code. |
The size of the given `data` might be less than the size of the staging buffer. This issue became apparent with the refactor in 6f16ea4 (gfx-rs#5946) since there is now an assert in `StagingBuffer.write()`. Ruffle ran into this in gfx-rs#3193 (comment).
The size of the given `data` might be less than the size of the staging buffer. This issue became apparent with the refactor in 6f16ea4 (#5946) since there is now an assert in `StagingBuffer.write()`. Ruffle ran into this in #3193 (comment).
Found in #3174 (comment)
Related: #3031, #2285
Description
Getting DX12 errors
or
depending if
queue.submit
is called in the repro below.This feels like a timing issue (also pointed out by @kvark in #2285 (comment)) since I could only reproduce this locally by increasing
array_size
to2048
. I also can't reproduce the issue on actual hardware (tried on an Nvidia dGPU and Intel iGPU).Repro steps
Expected vs observed behavior
No errors.
Platform
Windows 11, wgpu master (08b160c)
The text was updated successfully, but these errors were encountered: