-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash on Linux Nvidia 550 driver #12542
Fix crash on Linux Nvidia 550 driver #12542
Conversation
Welcome, new contributor! Please make sure you've read our contributing guide and we look forward to reviewing your pull request shortly ✨ |
Looks like I'm also undoing a change from #12055, where |
CI timed out but logs say that test was successful. Is this a real error? |
Why not just always ignore surface outdated errors? Rather than only for linux with nvidia 550 drivers |
The error is not expected anywhere else. But yeah always ignoring it would reduce the amount of code. I'm not opposed to doing it that way. |
The error was real but now fixed. |
I'm concerned about specifically checking for the 550 driver. This seems to be a fix by Nvidia, so 560, 570 etc. will contain this behaviour. And given that the error was effectively ignored in versions prior to 550, I think this should be safe to just ignore on Nvidia Linux as a whole? |
Yeah it was a panic before so it shouldn't be too bad to ignore it. I'll do that. |
1d385a7
to
05e1444
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a windowing expert, but cherry-picking these commits solved various crashes I get since I'm on linux + new Nvidia drivers so I'm very happy with this.
Why did you remove the |
After #12055 both branches do the same thing because |
584f230
to
3eef57d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say anything about the changes themselves as I don't understand windowing code, but testing on my machine I can confirm this does resolve the issue. (Void Linux, X.Org, nvidia-550.54.14_1 driver)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silently absorbing the error seems like a future issue waiting to happen. Can we downgrade this to an error or warning log?
I added a debug level log because it happens pretty often and there is no way to fix it. |
An |
f712ea6
to
ec10fe7
Compare
ec10fe7
to
c891c01
Compare
render_instance | ||
.enumerate_adapters(wgpu::Backends::VULKAN) | ||
.iter() | ||
.any(|adapter| adapter.get_info().name.starts_with("NVIDIA")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if starts_with()
is good enough here. We've had a bug somewhere else because the vendor name wasn't at the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the name
field there seems to be driver
that just says "NVIDIA" and nothing else. Would that be better? I don't have other cards to test with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how quirky these AdapterInfo
strings can get but maybe just changing it to name.to_uppercase().contains("NVIDIA")
could work. You could also add a fallback check of the driver
string to account for cases in which the name
would not match on an NVIDIA device, no idea if this would be necessary though. This is out of the scope of this PR but the same starts_with()
method is currently used in defining may_erroneously_timeout
in the lines above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't add random extra checks if we don't know what edge case we are handling. This has worked fine for everyone who tested it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for vendor id would probably be the best for reliability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm with a little test system that this spams true
:
(EDIT: I have an nvidia card of course..)
fn vendor_info_hex(adapter: Res<RenderAdapter>) {
let nvidia = 0x10de;
let vendor = adapter.get_info().vendor;
info!("Is nvidia? {}", vendor == nvidia);
}
found the ID value here: https://envytools.readthedocs.io/en/latest/hw/pciid.html#introduction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to wgpu there might be other bytes non-zero: https://docs.rs/wgpu/latest/wgpu/struct.AdapterInfo.html#structfield.vendor
in the ID, so might need to mask off only the 16 bits we care about
let vendor = adapter.get_info().vendor & 0xFFFF;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating. I made this change. I just compare to a constant, but an enum based abstraction for the vendor check would be better in the future and should be used everywhere instead of strings.
c891c01
to
8c940ac
Compare
# Objective Fix crashing on Linux with latest stable Nvidia 550 driver when resizing. The crash happens at startup with some setups. Fixes #12199 I think this would be nice to get into 0.13.1 ## Solution Ignore `wgpu::SurfaceError::Outdated` always on this platform+driver. It looks like Nvidia considered the previous behaviour of not returning this error a bug: "Fixed a bug where vkAcquireNextImageKHR() was not returning VK_ERROR_OUT_OF_DATE_KHR when it should with WSI X11 swapchains" (https://www.nvidia.com/Download/driverResults.aspx/218826/en-us/) What I gather from this is that the surface was outdated on previous drivers too, but they just didn't report it as an error. So behaviour shouldn't change. In the issue conversation we experimented with calling `continue` when this error happens, but I found that it results in some small issues like bevy_egui scale not updating with the window sometimes. Just doing nothing seems to work better. ## Changelog - Fixed crashing on Linux with Nvidia 550 driver when resizing the window ## Migration Guide --------- Co-authored-by: James Liu <[email protected]>
# Objective Fix crashing on Linux with latest stable Nvidia 550 driver when resizing. The crash happens at startup with some setups. Fixes #12199 I think this would be nice to get into 0.13.1 ## Solution Ignore `wgpu::SurfaceError::Outdated` always on this platform+driver. It looks like Nvidia considered the previous behaviour of not returning this error a bug: "Fixed a bug where vkAcquireNextImageKHR() was not returning VK_ERROR_OUT_OF_DATE_KHR when it should with WSI X11 swapchains" (https://www.nvidia.com/Download/driverResults.aspx/218826/en-us/) What I gather from this is that the surface was outdated on previous drivers too, but they just didn't report it as an error. So behaviour shouldn't change. In the issue conversation we experimented with calling `continue` when this error happens, but I found that it results in some small issues like bevy_egui scale not updating with the window sometimes. Just doing nothing seems to work better. ## Changelog - Fixed crashing on Linux with Nvidia 550 driver when resizing the window ## Migration Guide --------- Co-authored-by: James Liu <[email protected]>
Objective
Fix crashing on Linux with latest stable Nvidia 550 driver when resizing. The crash happens at startup with some setups.
Fixes #12199
I think this would be nice to get into 0.13.1
Solution
Ignore
wgpu::SurfaceError::Outdated
always on this platform+driver.It looks like Nvidia considered the previous behaviour of not returning this error a bug:
"Fixed a bug where vkAcquireNextImageKHR() was not returning VK_ERROR_OUT_OF_DATE_KHR when it should with WSI X11 swapchains" (https://www.nvidia.com/Download/driverResults.aspx/218826/en-us/)
What I gather from this is that the surface was outdated on previous drivers too, but they just didn't report it as an error. So behaviour shouldn't change.
In the issue conversation we experimented with calling
continue
when this error happens, but I found that it results in some small issues like bevy_egui scale not updating with the window sometimes. Just doing nothing seems to work better.Changelog
Migration Guide