-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Games crash on Nvidia due to memory allocation failures #1100
Comments
The same error with World of Tanks. |
The shader cache should have nothing to do with memory allocation issues. |
Ok. I'll try to reproduce this error but when it appeared I've downgraded kernel, dxvk and wine. Problem were that same. Only one thing was not changed. NV drivers (installation version in AUR was only new). Problem is for about two weeks. |
Maybe not, but as i have pointed out in another thread "dirty shader cache", it seems to me i have fewer crashes with a fresh .nv cache (delete the GLCache folder AND the WoW/Retail/Cache folder). If i keep clearing it regularly the crashes is less, but more stuttering at the start. Crashing while zoning COULD perhaps mean something weird happens when DXVK shader compilation is done? I assume that the shader compilation business with WoW goes something in the lines of: |
misyltoad/d9vk#170 - possibly connected. From my observations, crashes are more often if #103 - I was happy with this fix, some "heavy" games was able to use my whole VRAM, then RAM, swap, ... and still be alive 😄 . Or Test cache behaviour:
p.s. |
If you can grab /proc/slabinfo or slabtop output that would be helpful. As is the output from You could for example bump /proc/sys/vm/swappiness as a test, it would tell the kernel to be more active in freeing memory. Your gist doesn't show any swap at all which is odd. |
The average application doesn't even know or care about how much RAM you have at all. Someone on the VKx discord found that if VRAM is full, |
Yes, it was not a technical description.
swap:512MiB, swappiness:10, swap in my system used only as "fallback", it rarely filled and used as indicator "be ready". Also it's zram. Well, superposition test still the thing, I able to reproduce the issue running the "1080p" profile. It quits immediately when VRAM got filled. "720p" profile is fine with ~1200/1300MB used/allocated. I installed |
This is also an issue with Borderlands GOTY Enhanced. Seems to occour when loading new map areas/title sequence. It seems that this does not happen once loaded successfully into a map, until I have been playing for around 15-20minutes. For example, after loading in, traveling between seperate map areas (loading sequence) does not produce a crash no matter how many times you travel. But trying to load a new area after ~10 minutes crashes the game. Regarding #1100 (comment), Clearing an already built cache makes the game crash on launch with the same errors nearly every single time until the 3rd or 4th launch. Very strange. At first I thought this was an issue with Reshade, however it appears that this happens less often with Reshade active. Perhaps this is just placebo. d3d11.log (note: I removed a few thousand lines of compiling shader outputs, above paste limit) Specs: Cheers (side question: Is this a recent development? I've never noticed this with any other games before, although previous DXVK versions have the same error) |
Still crashing, and performance appears to remain the same. |
Is Borderlands a 32-bit game? In that case your issue is most likely something else, on Proton you can try |
The Enhanced version (remastered/released a couple months ago) I'm playing is 64bit, the remastered versions are also updated to DX11 from DX9. update: ge-wine does support |
I had the error in D9VK on my system with 32 GB on a 2080 Ti. Both RAM and VRAM were barely 25% used when I got this error. It has nothing to do with availability. Also interesting is that I can hit the error with BL2 in a couple of minutes, but I've been playing Bloodstained Ritual of the Night for much longer without a problem. Could it be something new that is not included in Proton yet? The errors are also relatively new to D9VK (as in, builds older than Monday 10 June were fine). |
The couple of times i have actually had any monitoring up while this crash happened with World of Warcraft and DXVK, the dxvk HUD had a bump in allocated up around 3.6GB-4GB, and nVidia SMI was barely 2GB'ish. This is with RTX2070 8GB card. |
I placed |
There have been no memory allocation changes at all for several months. Only 138dde6 (from today) changs things a bit, but most likely won't affect this issue at all. I also somehow doubt that this can be fixed within DXVK since it's the |
Yeah, just tried it and still crashed unfortunately. New lines in log:
|
As mentioned, I haven't actually run into this one myself with DXVK in Proton 4.2-7. But assuming that D9VK still shares the same memory allocation code, something changed in the last 10 days that made it highly sensitive. Maybe there is a hint there. |
There are a few people with Proton having similar crashing issues: https://www.protondb.com/app/729040 |
Well, found a little snippit to allocate ram via CUDA.
Needs cuda-dev-kit from nVidia (or distro). Compile with: That way you can allocate and "spend" vram without actually spending it.. What happened if i spend 6GB vram, was that WoW started as normal, and did not crash even tho after running around a bit and zoning++ vram was topped out at 7.9GB+ on my 8GB card. Did not crash, not notice any huge issues, but did not test more than maybe 10-15 minutes. However, using "gpufill" to load 7GB ram The "shared memory" thing between vram<->sysram probably does not work the same way that swap does i guess? Ie. in a memory starving situation things gets put to swap on disk, but once memory gets freed, it does not continue to be used from swap. I have no clue what is supposed to happen in a situation like that tho? Will do some more testing with this, and with the latest 138dde6 |
138dde6 Doing the same test as above with 7GB memory allocated with "gpufill", WoW loaded and had a lot higher fps, although some stuttering and framespikes.. closing "gpufill" to release 7GB vram brought the frametimes down, and fps up. Fairly playable, but i noticed GPU load was still 90%+ vs if normally where i was standing it usually is 45-50% with 30+ more fps.
One other thing i noticed was nVidia-smi seemed to indicate less vram usage from WoW. Is this due to "reuising chunks" so that "actual" vram is not so much? |
Since i am an incredibly slow learner, and a n00b.. Let me just ask this to TRY to get my head around this "allocated" thing. Reading from DXVK HUD, the "allocation" is 4500+ MB. What is this "allocation", and is this "unlimited"? Is the allocation limited by vram + system ram? (in my case 8 + 16 = 24GB) What i don't know is supposed to happen with this "dxvk allocation" is what happens if physical vram is full. From the tests it SEEMS as it will happily use system ram (as i guess this is the intended function). The "allocation" and "used" does not change, but WoW (according to nVidia SMI) uses less physical vram if the game is started in a vram starved situation vs. not. |
Indeed, but that would require recreatnig all Vulkan resources that are in system memory, as well as all views for those resources. This is an absolute nightmare, and I have no plans to do that. DXVK can let the driver do the paging so that it doesn't have to recreate any resources, however that only works on drivers which support |
SveSop, have you tried completely disabling GLCache with |
Since this extension IS available for Windows and nVidia, hopefully this COULD be a thing for Linux aswell. Is it up to the driver not to mess this up? If i have 2GB physical vram, and DXVK allocates 4.5GB, it is feasible to think 2.5GB of that is allocated in system ram, but if i have 8GB vram, it "should" be allocated in vram... but that does not seem to be the way things actually works i guess. Can one blame the driver for putting stuff "where it seems fit", assuming |
While this is another silly Nvidia-exclusive story of pain, this shouldn't affect issues on xorg (and probably isn't important for anything but Wayland compositors themselves anyway). |
Getting same crash on work computer, with Radeon HD 8570 / R7 240/340 OEM |
440.26 was released. Wonder if this is related?
Would be nice for NVIDIA to provide at least some context. I don't see much of a point buying nvidia hardware anytime soon. |
I believe it is, but there's another issue regarding sysmem allocations. There's a patch floating around somewhere, but I don't know if it made its way into any official driver release yet. |
As the context provided in the changelog entry implies, this fixes a different issue with different symptoms. That was #1169 For the issue here, there has been a patch floating around that we were waiting for feedback on, and we didn't really get a lot of testing data from end users. It has now been added to our trunk, and will show up in the next release in our Vulkan beta sidebranch, as well as in an unspecified future official release. |
It seems this patch was introduced in the vulkan beta branch with 435.19.03 a while back, but its nice to have it for a release driver. @ahuillet This means adding this patch and play for 3-4 hours a couple of times is not enough to say "it fixes things" sadly. Perhaps with a large enough test-base it could indicate something, and implementing this is the only way to go, as most ppl do not patch and compile their own drivers. Is there an extension one can easily use to check allocations in native vulkan apps? Like the DXVK HUD outputs "allocated" and "used". It could perhaps be an interesting experiment to see how allocations are used in games like The Talos Principle or similar when compared to DXVK/Wine games. |
There was no context, unless you're privy to something else. No where was Squad even mentioned let alone the bug. There are several reports of memory allocations failing. 1169 may be triggered by some other specific code path within the driver but the end result is the same as this one. Memory is not allocated. The crashes likely are a result of where the allocation happened. When the errors are being reported by the hardware it's pretty far outside anything end users would understand. What else is NVIDIA expecting? It's literally impossible to diagnose a binary blob. Maybe it's a specific bios vendor or some combination of hardware. |
Could someone please summarize which 64 bit applications definitely are affected? |
Updated nvidia/dxvk 2 days ago, to current at that day. |
@alexzk1 Yes, the effect of fallback allocation from sysmem can result in reduced performance. I can confirm that with the driver fixes, games are more likely to lag instead of crashing. Tho, I never saw hard/complete system freezes as the result of that. In KDE there's a keyboard shortcut to invoke a window kill mouse pointer (Ctrl+Alt+Esc by default). It always worked, tho, it could take 20-30 seconds to kill the game window. Usually, this leaves processes lingering which should then be killed with What always helped the lags (after the driver fixes) in my case was lowering the texture quality. In many games, reducing the quality just one step usually helps a lot with only barely visible loss of render quality, especially if you're going from "ultra" to "very high", or from "very high" to "high". There's a patch floating around which patches the open source interface of the driver to use a different allocation strategy. It instructs the kernel to not immediatly fail kernel memory request but let the driver retry while the kernel tries to free some memory for the allocation. It will still be in the fallback path, of course (allocating from sysmem). The patch is here: @wgpierce is asking here for trying the patch and reporting back the results. I tried this patch but I found it introduces lags in normal desktop usage in render-heavy browser tabs, and also in games (other types of lags not seen before), without really improving the situation of the existing problems. Not sure if this was coincidence or really a side-effect of the patch. I'll try to reproduce that later. |
This bug has always been about sysmem allocation failures. This isn't a fallback path. |
Yes, I know that. You probably wrote that because I wrote "what helped [...] in my case was lowering the texture quality". While this would indeed sound like a video memory allocation issue, it really isn't in my case: VRAM always had free space left (1GB+). But still, reducing the texture quality reduced crashes. There may be some interaction between handling texture uploads and sysmem allocations. With current drivers, it now doesn't crash but starts to lag/stutter at the same instant in the game. If that's not a result of the latest driver changes, what is it then? Why would this bug be affected by texture usage if it shouldn't be? Texture memory was never really an issue yet. With too high texture settings, games would either crash very early (during load), or have low FPS right from the start because texture reads are going to sysmem. |
Borderlands: The Pre-Secuel via Forced PROTON with the UHD textures pack installed, crash with D9VK....
With D9Vk disabled, the game doesn't crash, but the performance is horrible. The GPU is an Nvidia 2060 SUPER This is the Steam log... I don't know how to create a full DXVK log. |
Borderlands: The Pre-Sequel is a 32bit game so it probably runs out of address space just like Borderlands 2. |
I know, that's why I have PROTON_FORCE_LARGE_ADDRESS_AWARE=1 on both games.. Remember that if I use the default Proton OpenGL, The Pre-Sequel works without any crash... |
This is a DXVK/D9VK issue (higher memory usage). It's not the Nvidia driver issue. |
@CSahajdacny @K0bin Yes The Pre-Sequel runs out of address space with UHD textures. That game blows out the address space even with PROTON_FORCE_LARGE_ADDRESS_AWARE=1. That issue is the same as here (BL2 also suffers from it) and is not what the Nvidia devs are trying to fix in this issue. |
Weird. |
It can also be exacerbated by screen resolution. Discussion of it would be better suited in misyltoad/d9vk#170 |
Not sure guys what u did there, but I got banned for client modification of E:D. |
@doitsujin I can reliably reproduce this with d9vk and Heroes of Might and Magic 5 (in the main campaign and when starting the Dark Messiah addon). In the main campaign it crashes when starting a cutscene (the game automatically saves right before it, which means that loading the save triggers the crash) and the addon dies when starting. Is there something I can provide you with or is this purely a driver issue (skimmed the comments, but have a hard time extracting meaningful information)? |
Again, crashes with 32-bit games running out of memory are unrelated. |
@doitsujin Unless I misinterpreted what I'm seeing and what this is about, I do believe that I see the same problem. dxvk reports that the memory allocation failed, but the stats on the next line show that there is enough memory available. Also this doesn't happen without d9vk as far as I can tell (but I'd have to test that again). |
Again, you are testing 32 bit games which have only 2GB of virtual memory available, which isn't always enough for d9vk (or dxvk in general). This has nothing to do with this issue. |
Anyway, closing since the original problem should mostly be fixed. |
440.59 should be the first stable driver revision to fix the issue. |
For some reason it looks like DXVK's device memory allocation strategy does not work reliably on Nvidia GPUs. This leads to game crashes with the characteristic
DxvkMemoryAllocator: Memory allocation failed
error in the log files.This issue has been reported in the following games:
#1099 (Bloodstained: Ritual of the Moon)
#1087 (World of Warcraft)
If you run into this problem, please do not open a new issue. Instead, post a comment here, including the full DXVK logs, your hardware and driver information, and information about the game you're having problems with.
Update: Please check #1100 (comment) for further information on how to get useful debugging info.
Update 2: Please also see #1100 (comment).
Update 3: Please update to driver version 440.59.
The text was updated successfully, but these errors were encountered: