nVidia drivers change in memory management #1285

vladmandic · 2023-06-03T14:14:46Z

vladmandic
Jun 3, 2023
Maintainer

it seems that nVidia changed memory management in the latest versions of drivers, specifically 532 and 535

new behavior is that once gpu vram is exhausted, it will actually use shared memory thus causing massive slowdown - easily 10x
good side (hey, have to look at it that way as well) as that OOM is far less likely - but at the cost of 10x performance drop, no chance
also, when spillover happens, memory pretty much spins out of control and returns to normal only after app restart (stopping generate does not do anything)

this feature is deep inside device drivers and completely outside of application control
even advanced gpu tuning utilities dont seem to have capabilities to turn it on/off or tune it

i've checked release notes and there is no mention of it, but there are too many reports (not just sd community) to ignore

version 531 seems to be the last unaffected version

zaileron · 2023-06-03T17:44:33Z

zaileron
Jun 3, 2023

Noticed this on 535. Nice that it doesn't CUDA out sometimes, but a lot more system stress.
Have seen the system ram slowly stairstep down over time, but Python does sit with 8+ gb of system ram used perpetually after this. Also OOM's more easily if you have a subsequent large task.

Was wondering how I was able to suddenly to 2048x2048 tiling...

0 replies

iDeNoh · 2023-06-03T20:08:11Z

iDeNoh
Jun 3, 2023

This sounds very much like amds smart access memory, but that has to be enabled in the bios. Any chance that this could be prevented with a bios change?

3 replies

vladmandic Jun 3, 2023
Maintainer Author

There are no settings for this.

I suspect disabling resizabe bar in bios would essentially nuke this option, but that has a performance penalties of its own.

iDeNoh Jun 3, 2023

Good point, sounds like a good opportunity for some users to submit product feedback lol

YHD233 Jun 21, 2023

I checked in gpuz that the resizebar of 2080ti is not turned on, but the memory will be used when the video memory is full when it is generated, resulting in a long wait before generating the next picture

msha3er · 2023-06-04T09:19:14Z

msha3er
Jun 4, 2023

My SD is getting too slow specially with CN, I guess that's happened after Nvdia latest update, how I can fix it, please

6 replies

Tillerz Jun 4, 2023

Roll back the driver to ~~532~~ 531 or older. And/or wait for someone to find a valid workaround to fix it. :)

vladmandic Jun 4, 2023
Maintainer Author

Roll back the driver to 532 or older. And/or wait for someone to find a valid workaround to fix it. :)

you mean 531 or older?

msha3er Jun 4, 2023

Thank you guys

JelloWizard Jun 19, 2023

i just installed 536. gonna try it and report back.

NoMansPC Jun 20, 2023

I'm on this driver and can confirm that stable diffusion is still much slower than before.

wzgrx · 2023-06-06T13:05:03Z

wzgrx
Jun 6, 2023

Any downgraded NVIDIA driver? The 535 NVIDIA driver has slowed my SD speed by 50 times

11 replies

zaileron Jun 7, 2023

Seems to be aware of how much memory the GPU has and how much memory will be requested. Maybe a checkbox to error-out or otherwise handle memory requests known to be larger than the GPU's capacity?

Unfortunate part of having to idiot-proof these sorts of tools :P

morphinapg Jun 10, 2023

It's slower if you exceed your GPU VRAM capacity, then your system shunts it to system RAM to prevent OOM errors. You're likely making images that are too big for your GPU.

Why the images don't OOM with version 531 but effectively do for 535 is hard to say though. Did 535 kill low-vram options?

It seems they're copying to system RAM far too soon, way before it would OOM. Something is seriously wrong with the threshold where this behavior is happening. It shouldn't happen at all except for situations that would OOM in the past.

vladmandic Jun 10, 2023
Maintainer Author

i agree. bigger problem is that nvidia hasn't even officially confirmed this change in direction nor offered any way to enable/disable it.

howDoILearnPython Jun 11, 2023

It's slower if you exceed your GPU VRAM capacity, then your system shunts it to system RAM to prevent OOM errors. You're likely making images that are too big for your GPU.
Why the images don't OOM with version 531 but effectively do for 535 is hard to say though. Did 535 kill low-vram options?

It seems they're copying to system RAM far too soon, way before it would OOM. Something is seriously wrong with the threshold where this behavior is happening. It shouldn't happen at all except for situations that would OOM in the past.

I think this is card-dependant. The driver might not optimized for all VRAM/chip combination
My 3060 12G didn't experience slow down when VRAM usage at around 11.6GB (during koyha lora training)

But experienced significant slowdown at around 11.8GB VRAM and the shared memory being used (data from task manager)

Actually the behavior of using shared memory can effectively prevent OOM at the final denoise step (usually have memory surge).

morphinapg Jun 11, 2023

It's slower if you exceed your GPU VRAM capacity, then your system shunts it to system RAM to prevent OOM errors. You're likely making images that are too big for your GPU.
Why the images don't OOM with version 531 but effectively do for 535 is hard to say though. Did 535 kill low-vram options?

It seems they're copying to system RAM far too soon, way before it would OOM. Something is seriously wrong with the threshold where this behavior is happening. It shouldn't happen at all except for situations that would OOM in the past.

I think this is card-dependant. The driver might not optimized for all VRAM/chip combination My 3060 12G didn't experience slow down when VRAM usage at around 11.6GB (during koyha lora training)

But experienced significant slowdown at around 11.8GB VRAM and the shared memory being used (data from task manager)

Actually the behavior of using shared memory can effectively prevent OOM at the final denoise step (usually have memory surge).

Honestly, I think if it worked correctly, it could be a benefit in some situations, like where you spend 99% of the time within memory limits, but occasionally spike out of it (this happens to me in some Dreambooth configurations for example). But as it is now, on my 3080Ti, it was causing issues even when using only 10GB of my 12GB total.

Reverting to 531 solved my issue, but I really hope Nvidia addresses this because I don't want to be stuck dealing with an old driver forever.

vladmandic · 2023-06-06T14:06:15Z

vladmandic
Jun 6, 2023
Maintainer Author

actually my gut feeling is that they did it as a follow-up to nVidia CEO statement defending choice that just-launched RTX 4060Ti only has 8GB while some latest game titles require up to 10GB to even load - without force-enabling use of shared memory, their latest GPU would not work with latest games. that's ok if you can say "older cards don't support latest games", but you cant really say that for just-launched card.

actual quote from Computex 2023:

Huang defended the 8GB of VRAM and told gamers to focus more on how that VRAM is managed: “Remember the frame buffer is not the memory of the computer — it is a cache. And how you manage the cache is a big deal. It is like any other cache. And yes, the bigger the cache is, the better. However, you’re trading off against so many things.”

and on why latest games require more than 8GB of VRAM? because they are developed for consoles and only ported to PC without optimizations - game studios are rushing release dates. and latest generation of consoles have 16GB of shared memory, so thats about 12GB for shaders.

8 replies

vladmandic Jun 7, 2023
Maintainer Author

It's got to be a lot easier to downscale graphics to a console than to take your console graphics and upscale them so they're actually good on a PC,

yes, but it takes time and game studios don't want to miss sales dates. downscaling is easy, but you also need to optimize a bit.
remember cyberpunk launch fiasco? it was written for pc and downscaled for consoles, but they rushed release date and at the release it barely run on consoles and looked horrible.

and if it looks good on console and not on pc, you can always blame the gpu as market is soo non-uniform.
by the time anyone proves anything, there's going to be a patch of a sort (that's their logic).

Aptronymist Jun 7, 2023
Collaborator

Just goes to prove it, there's nothing some fail-upwards jackass in a suit with a corner office can't screw up.
The same people that screw everything up in pretty much every industry. Buzzwords, jargon, empty promises, and bullshit!

vladmandic Jun 7, 2023
Maintainer Author

no arguments there!

biorpg Sep 9, 2023

NVIDIA doesn't make such broad sweeping changes to their driver to accomodate gaming on a lower end card. There are a plethora of existing methods, including many specific to NVIDIA's driver for dynamically handling VRAM for games. Additionally, there is an enormous list of tweaks included in each driver package that adjust parameters for nearly every main-stream game you might play during the time that your GPU is historically relevant.

What NVIDIA does do, because it is the current direction of most of the industry, is implement 'guardrails' to the extent required of them, and then go beyond this happily applying their own agenda under the guise of 'safety', with a convenient scapegoat should anyone question the (apparant) act of hiding something- thereby relieving them of any need to specify what those (less-apparant) somethings are.

Ulf3000 Sep 21, 2023

that would mean the creator driver doesnt have this option enabled only the gaming driver?

bkosowski · 2023-06-07T15:16:20Z

bkosowski
Jun 7, 2023

The 532.03 driver's release notes has this:

Additionally, this Game Ready Driver introduces significant performance optimizations to deliver up to 2x inference performance on popular AI models and applications such as Stable Diffusion.

Coincidence?

2 replies

zaileron Jun 7, 2023

Totally different AFAIK.

vladmandic Jun 7, 2023
Maintainer Author

yup, totally different. thats optimizations specific to directml-onnx under codename 'olive' so it can trigger tensor cores more easily.

zaileron · 2023-06-07T19:44:50Z

zaileron
Jun 7, 2023

Sort of. It’s too easy to overdevelop on PC (and remember that all of these games are developed on PC). Tons of games designed for the 4090s the developer PCs are rocking then people run into trouble when they’re trying to get their 1060 to get it to run. Most AAA titles run great on high end hardware and the console they’re made for. It’s everyone else that gets shafted.

…

On Wed, Jun 7, 2023 at 12:14 Aptronymist ***@***.***> wrote: and on why latest games require more than 8GB of VRAM? because they are developed for consoles and only ported to PC without optimizations - game studios are rushing release dates. and latest generation of consoles have 16GB of shared memory, so thats about 12GB for shaders. That's so spot-on, it's the most idiotic thing that these companies develop for consoles and then port to PC. It's got to be a lot easier to downscale graphics to a console than to take your console graphics and upscale them so they're actually good on a PC, (not to mention the inevitable UI/control/camera issues), but they only give a crap about raking in the quick cash with their new 2023 edition of the same game that came out last year, and console games are a *huge* market. Ugh. — Reply to this email directly, view it on GitHub <#1285 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOKLG42TZI7VRJJLBUS7LS3XKCZGPANCNFSM6AAAAAAYZKETOY> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Kumimono · 2023-06-07T20:01:58Z

Kumimono
Jun 7, 2023

Is there a difference between Game Ready Driver and Studio Driver?

1 reply

vladmandic Jun 7, 2023
Maintainer Author

in general, studio drivers are 1-2 releases behind and just more tested. but this is not considered a bug by nvidia, this is a design choice.
so even fi studio drivers work today, thats only because they haven't (yet) caught up with game ready drivers.

Hexag0nSun · 2023-06-08T04:44:30Z

Hexag0nSun
Jun 8, 2023

Thanks for the heads up, I was starting to think my setup is cursed or something. Returning to 531.79, if anyone finds earlier versions work better please post in this thread 🙏

1 reply

JelloWizard Jun 9, 2023

i would get burned or distorted images

fbellgr · 2023-06-08T13:26:37Z

fbellgr
Jun 8, 2023

Actually, I have two linux machines on 530.30.02 and I have started experiencing those slow downs on both suddenly in the past couple of days. Same thing on automatic1111. Something else has changed. I pull from git and upgrade torch nightly every day.

8 replies

vladmandic Jun 8, 2023
Maintainer Author

a) memory management, b) non-ideal cross-optimization set for a given platform, c) live preview model running at full instead of some other faster method, d) buggy gpu that cannot run in fp16 so its forced in fp32
that pretty much sums all i'm aware of.

fbellgr Jun 8, 2023

Live preview. I figured it out after your first reply.

Somehow, on both installs, there was suddenly no live preview mode selected and it apparently fell back on Full VAE. I distinctly remember having it on simple by default before.

Selecting simple made it go back to normal.

I can't be the only one to whom this has happened and then wrongly believe that they have an issue with drivers or torch.

I did not edit live preview mode before. Ever.

rkfg Aug 1, 2023

I run 535.54.03 on Linux and I have no slowdowns on 3090 Ti, when it exhausts the memory it OOMs as usual. So Linux users are not affected I think.

Beker-Aman Aug 2, 2023

what is live preview and how to select simple, am a complete noob.

vladmandic Aug 9, 2023
Maintainer Author

Settings -> Live preview

git4ai · 2023-06-08T13:28:13Z

git4ai
Jun 8, 2023

So? My 8G card can train the dreambooth now?

0 replies

howDoILearnPython · 2023-06-10T13:20:28Z

howDoILearnPython
Jun 10, 2023

I am using RTX3060 12GB with -medvram
Most of my image generated did't hit the 12GB cap., i didn't see any performance slowdown.

apart from that, i can do far more higher upscale without OOM.
i think this driver seems better in term of usability. Being slow, better than oom.

2 replies

Aptronymist Jun 17, 2023
Collaborator

Wait, why would you be running an RTX3060 12gb, the same as me, with --medvram on? You don't need that at all.

Ulf3000 Sep 21, 2023

if you know a bit about digital art and how to steer SD then you dont need to binch generate either, and generation time becomes less important

bbecausereasonss · 2023-06-10T16:54:46Z

bbecausereasonss
Jun 10, 2023

I was wondering why everything was feeling so slow... Effing nvidia....

0 replies

ShisoFox · 2023-06-11T16:48:50Z

ShisoFox
Jun 11, 2023

I have started experiencing freezes where image generation stops at 100% but never returns the finished image. Not sure if this is because of the driver update or something else.

2 replies

vladmandic Jun 11, 2023
Maintainer Author

That's something else, likely your live preview settings got mixed up during upgrade, just reset them.

Beker-Aman Aug 2, 2023

please can you elaborate more on this topic am having the same issue, for example if i generate 2 images it gets stuck in the first one in 96%. and i downgrade to 531 but i thin the issue you mentioned here is the culprit.

TheCrake · 2023-06-11T20:46:19Z

TheCrake
Jun 11, 2023

Everyone here should put in a support ticket with nVidia asking them to it so this can be disabled in the driver settings. Otherwise we might be stuck with this forever and eventually need to use super slow drivers.

2 replies

vladmandic Jun 11, 2023
Maintainer Author

Yeah, I'd tend to agree with that

Hexag0nSun Jun 13, 2023

I used the "Send feedback" option in Geforce Experience app for this, also linked this thread for further review. Ticked the checkbox that allows them to contact me back.

Traslogan · 2023-10-25T22:38:50Z

Traslogan
Oct 25, 2023

This appears to be an issue for me with an RTX3080Ti. It wasn't happening until a few days ago, at least not that I noticed, but then started happening. I often do image gen at 768x960 which is very comfortable for the 12GB of the 3080Ti, but maybe every third image is over 40% slower than it should be for absolutely no apparent reason and the GPU isn't even breaking a sweat at 7-8GB VRAM usage.

1 reply

Ulf3000 Oct 28, 2023

i have a 3060 with 12GB and it doesnt slow down below approx. 1600x1000px , after a restart i can even do close to fullhd .. there may be something wrong with your system , a vram hog which you didnt identify. For me it was vscode (probably becasue of a coding ai addon). i now run vscode through sandboxie and disallow access to gpu, same for discord client.

vladmandic · 2023-10-28T21:15:12Z

vladmandic
Oct 28, 2023
Maintainer Author

i'm experimenting with something, can multiple users post a line from their console log BEFORE and AFTER slowdown occurs:

17:13:54-032823 INFO Processed: images=1 time=2.94s its=3.40 memory={'ram': {'used': 6.01, 'total': 27.41}, 'gpu': {'used': 6.3, 'total': 22.49}, 'retries': 0, 'oom': 0}

for example, two runs with low and high resolution to trigger the bad behavior.

0 replies

zethfox · 2023-10-29T04:40:14Z

zethfox
Oct 29, 2023

I've mentioned that 16 gigs of something are loaded into VRAM every second gen in the memleak bug you previously debugged with my logs, recent git pull never fixed that part of it...I managed to smooth it out slightly by setting torch garbage collection to 50...it's current behavior is first gen is fine, second gen has 16g of VRAM stuck in use...3rd gen is slow slow...4th gen dumps whatever is in VRAM starting the cycle over...setting gc to 50 has it dump basically every gen for me, it still consumes 16g but never triggers the slow down unless i do batches of 5 or more...

0 replies

Dampfinchen · 2023-10-31T13:29:44Z

Dampfinchen
Oct 31, 2023

This is awesome. With driver 546.01 Nvidia introduced an option to disable shared memory for CUDA by simply ticking a box in the graphics driver application menu.

https://nvidia.custhelp.com/app/answers/detail/a_id/5490

So for those who prefer crashing instead of slowing down, the option is there now.

Thank you Nvidia for listening to the AI community!

19 replies

VL4DST3R Nov 1, 2023

i can confirm the previous driver was leaved around

I believe either windows or nvidia keeps the previous driver version for the eventuality that something goes bad and needs to roll back. I went ahead and downloaded the same tool you use and checked myself and although I never do clean installs, I only have the two most recent drivers listed there instead of ~5 years worth.

Tillerz Nov 1, 2023

I think the clean install only deletes the profiles.

VL4DST3R Nov 1, 2023

I believe that's what it says when you tick the box for it too, isn't it?

vladmandic Nov 1, 2023
Maintainer Author

and only partially. clean install is not really clean at all. for actual clean install, you need ddu to uninstall first and then run install.
for example, try switching gpu and doing clean install - you'll see that profiles for old gpu are left behind and things like geforce experience control overlay start badly misbehaving since they cannot bind to old gpu that's no longer in the system.

all-in-all, do upgrade. and delete obsolete drivers once-per-year if you like, really no point of doing it more often. and for actual clean install, run ddu first and then install.

jmilez Jan 12, 2024

THANK YOU!!!

This also helped on my other computer that recently had a Windows 10 to Windows 11 migration with a RTX2060 that was dog slow with my trading platform. I was considering rolling back to 10 because it was so slow to switch charts and windows. I turned off the GLOBAL memory sharing because I don't need it on that computer and responsiveness is about what it was running windows 10 (which would probably be even faster if I still was running Windows 10). So if you have an older computer with a Nvidia card and wondering why it is sluggish when you click on things, just turn memory sharing OFF! It doesn't only apply to Stable Diffusion...

DubbaThony · 2023-10-31T14:22:26Z

DubbaThony
Oct 31, 2023

So since Nvidia solved the problem on their side, issue is now just documenting that in installation notes to change the setting in control panel (and to have correct version of driver installed).

16 replies

vladmandic Nov 1, 2023
Maintainer Author

Many things are faster on WSL2 than Windows. Blender GPU rendering can be 30-60% faster, CPU based simulation bakes can be 10-20% faster. It's silly. The GPU numbers can be particularly egregious because Windows talks to the GPU through some middle layer that Linux does not. WSL embarrasses the heck out of windows across the board though.

That has nothing to do with Wsl and is wrong understanding how wsl works. Wsl uses passthough to GPU drivers installed in windows. Wsl does not talk to GPU directly. So by definition wsl cannot be faster in GPU tasks than windows - all it can be is equal.

Now, if you setup your app to use wrong rendering engine, thats on app. 3d apps inside Linux default to OpenGL. 3d apps in windows most commonly default to Directx. But apps such as blender are highly configurable.

VL4DST3R Nov 1, 2023

Yeah I was about to ask- those values make no sense. How can something like blender be magically 60% faster when doing the exact same workload at (presumably) the same gpu utilization?

All that should differ is the reduced overhead from having less things competing for resources on your machine on linux.

So unless your install is filled with insane levels of bloat or your hardware is so dated that those extra processes/hundreds of megs of ram taken make that big of a difference, it should be negligible (bar stuff like fs architecture differences like vlad said).

vladmandic Nov 1, 2023
Maintainer Author

Or just as simple as a bad choice of renderer.
For example, you can set blender to use dx9 if you want - so you won't be using any of the mesh shaders and overloading legacy vertex shaders. It's apps choice what it uses. Even browsers can run in different modes nowadays.

markrmiller Nov 2, 2023

People have done the benchmarks. There is no doubt WSL is faster. I used to run ML training on both to compare, these days I do nothing GPU in pure Windows. I didn't say WSL has direct access to the GPU, I said it doesn't have to go through some crap windows middle layer that Linux handles much better. Some people have profiled down to the differences - obviously not Microsoft.

Anyway, someone on the Nvidia forums said they would ask the Cuda team to look into WSL not working.

vladmandic Nov 2, 2023
Maintainer Author

I have done benchmarks. A lot. With SD and with other stuff. And what you're saying doesn't hold.
Unless you have provale numbers, don't push the theory.

AndreyRGW · 2023-10-31T14:58:54Z

AndreyRGW
Oct 31, 2023

v546.01
Driver Default Sysmem Fallback policy

VS

Prefer No Sysmem Fallback

768x512, 3x (2304x1536) hires fix, RTX 2060 12GB.

It seems to be a lot faster for me

4 replies

sinand99 Oct 31, 2023

So, with the new driver, we don't need medvram switch for 8GB gpus?

vladmandic Oct 31, 2023
Maintainer Author

new driver is not automagically make your vram bigger. so if you needed medvram, most likely you still do.
new driver makes it so that if you run out of vram, it will fail instead of lock up the system by slowing to a crawl.

AndreyRGW Oct 31, 2023

new driver is not automagically make your vram bigger. so if you needed medvram, most likely you still do. new driver makes it so that if you run out of vram, it will fail instead of lock up the system by slowing to a crawl.

Well by the way, as you can see from my screenshots, my vram should overflow and give out of memory. With the default driver policy stable diffusion starts using the ram, however, if I set "Prefer No Sysmem Fallback" then out of memory should happen, but it doesn't, it feels like some garbage collector starts working. I have checked this in several neural networks.

sinand99 Oct 31, 2023

Yes on my 3070 Ti and Prefer No Sysmem Fallback enabled, I can confirm now without medvram, I get cuda OoM error. With medvram, I can generate normally like before. But if I use hires.fix or img2img, it fails with cuda error again. Before this driver, it was working without errors. So this feature is a regression for me. Luckily I don't use them much, I upscale with Extras tab.

Kedaranatha · 2023-10-31T21:03:47Z

Kedaranatha
Oct 31, 2023

I’ve been experiencing the same issue with my 3090 when I’m running hires in XL. I haven’t encountered this error in a long time, even with the previous driverSent from my iPhoneOn Oct 31, 2023, at 1:55 PM, Sinan Dinç ***@***.***> wrote: Yes on my 3070 Ti and Prefer No Sysmem Fallback enabled, I can confirm now without medvram, I get cuda OoM error. With medvram, I can generate normally like before. But if I use hires.fix or img2img, it fails with cuda error again. Before this driver, it was working without errors. So this feature is a regression for me. Luckily I don't use them much, I upscale with Extras tab. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

1 reply

esbe1175 Nov 1, 2023

does version 546.01+ help you?

Fixed General Bugs

Windows Event Viewer incorrectly logs an error when NVTOPPS stops [4331318]
LG C3 OLED TV does not show up as a validated G-SYNC Compatible display in the NVIDIA Control Panel [4247350]
Stable Diffusion significant performance reduction after driver update [4166994] • Workaround outlined here: https://nvidia.custhelp.com/app/answers/detail/a_id/5490

morphinapg · 2023-11-17T10:25:47Z

morphinapg
Nov 17, 2023

My performance seems to have dropped by about 10% with the newest driver, even with sysmem fallback disabled. Anybody else noticing anything like that?

0 replies

cgidesign-de · 2024-02-09T15:27:08Z

cgidesign-de
Feb 9, 2024

Nvidia Driver 551.23 on Windows 11
Set to "No System Fallback".

But:
Whenever I set VAE to Full Quality vram goes to max and then randomly I get the massive slowdown of vram <> ram exchange.
Card is a 4070 Ti Super with 16 GB.
SD is from 2024-02-08 (but it happened with the version from last year as well)
System ram is 64 GB.

It only happens in the last stage, when the VAE is processed. It only happens with VAE full quality. It happens randomly - about one of four generations shows the issue.
It never happens with VAE full quality off.

Is this a know issue of this driver version? Is there anything I can do to solve it?

0 replies

cgidesign-de · 2024-02-09T17:08:02Z

cgidesign-de
Feb 9, 2024

Ok, update to the above post.
The behaviour only occures if I edit the prompt while the image generation is already running (which I did all the time). If I just hit start and wait till generation is finished, it works without the issue.
Very strange.

0 replies

grimulkan · 2024-02-09T19:02:07Z

grimulkan
Feb 9, 2024

The new driver does not do what the old driver did even with the added setting. If you are very close to the memory limit and the allocation is a certain size, it will still fallback to system even if you tell it not to in control panel (they do call it a preference in the settings).

In some cases, this is actually good, and spike VRAM loads get absorbed for very small throughput hit.

In other cases, like training, the VAE stage, very large image generations, or LLM generations of a certain size (2.5bit 70B LLMs), it can cause massive throughput reductions, whereas in the previous driver things would have been just fine.

If you have a really big allocation, then it triggers OOM (assuming system fallback is not preferred). So you have to be in this narrow regime, close to the VRAM limit, for the swapping/throughput loss to occur. That makes it seem a little random. It works just fine if I reboot with the older drivers in these cases.

It's really annoying if your workflow takes you to that regime. Basically NVIDIA still swaps to RAM a little too aggressively compared to the prior driver. But some workflows that may have OOMed due to spike loads now work better.

I think that's where we're at, and probably where we'll be stuck for a while without ~~Quadro drivers~~ (Quadro does the same as GeForce drivers actually) or Linux. It seems to have satisfied most of the complaining customers, leaving only niche users out, and Nvidia can just claim we shouldn't be doing these things on ~~a Gaming GPU even with Studio drivers, or something~~ WDDM.

Edit: If your GPU supports TCC mode that's still an option, though they disabled that for gaming GeForce (like 4090) for a while now. It works on Quadro.

0 replies

cgidesign-de · 2024-02-09T19:34:32Z

cgidesign-de
Feb 9, 2024

Thanks for explaining it. But it only happens on my side, if I edit the prompt during active image generation - it never happens if I don't edit the prompt. Once I found that out, I can reproduce it every time. So I think, something else is going on. Maybe an edit is allocating vram memory in some way and that triggers the fallback. But this is a amateur guess as I am not a developer. Anyway, for me the issue is solved as I can just leave the prompt alone. But maybe others suffer from the same issue and like to try this as well.

0 replies

kurageart · 2024-02-12T12:15:23Z

kurageart
Feb 12, 2024

any downsides on sticking with 525.147.05 (cu118) ? I'm on debian stable , I need the nvidia-toolkit-dev package for other projects, and this version is the only one on the official repos (up to sid). my only other option, beside changing distro , would be using the latest nvidia cuda toolkit 12.3 from nvidia, but then it would be too new for pythorch binaries (i guess to use that I'll need to compile manually)

0 replies

stepahin · 2024-03-27T15:28:07Z

stepahin
Mar 27, 2024

Hey guys! Came here from reddit from a discussion on the best driver for Kohya. Can you recommend which 4090 driver is best for Lora training speed on win11? I know the best option is to go Ubuntu, but I'll leave it for later.

I'm new to the party and only have 2.50s/it on 4090, butch size 5, xformers, gradient checkpoint on, bucketing on, default kohya settings, seems too slow... :/

2 replies

VL4DST3R Mar 27, 2024

I believe the last few drivers all now have the option within the control panel to disable memory fallback, which was the issue discussed here, so any of those (with said setting disabled) should perform just as well as any other.

grimulkan Mar 27, 2024

If you have large slowdowns and are close to the VRAM limit, make sure you're not falling into the narrow regime I described above. But yes, there is no choice in terms of driver version any newer one is as good (or as bad) as any other. Studio, Quadro, GRD, doesn't matter.

bapohka · 2024-04-01T14:30:04Z

bapohka
Apr 1, 2024

OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.00 GiB is allocated by PyTorch, and 257.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

i am not sure, is it the same type of problem?

0 replies

nVidia drivers change in memory management #1285

vladmandic Jun 3, 2023 Maintainer

Replies: 82 comments · 223 replies

vladmandic Jun 3, 2023 Maintainer Author

vladmandic Jun 4, 2023 Maintainer Author

vladmandic Jun 10, 2023 Maintainer Author

vladmandic Jun 6, 2023 Maintainer Author

vladmandic Jun 7, 2023 Maintainer Author

Aptronymist Jun 7, 2023 Collaborator

vladmandic Jun 7, 2023 Maintainer Author

vladmandic Jun 7, 2023 Maintainer Author

vladmandic Jun 7, 2023 Maintainer Author

vladmandic Jun 8, 2023 Maintainer Author

vladmandic Aug 9, 2023 Maintainer Author

Aptronymist Jun 17, 2023 Collaborator

vladmandic Jun 11, 2023 Maintainer Author

vladmandic Jun 11, 2023 Maintainer Author

vladmandic Oct 28, 2023 Maintainer Author

vladmandic Nov 1, 2023 Maintainer Author

vladmandic Nov 1, 2023 Maintainer Author

vladmandic Nov 1, 2023 Maintainer Author

vladmandic Nov 2, 2023 Maintainer Author

vladmandic
Jun 3, 2023
Maintainer

Replies: 82 comments 223 replies

vladmandic Jun 3, 2023
Maintainer Author

vladmandic Jun 4, 2023
Maintainer Author

vladmandic Jun 10, 2023
Maintainer Author

vladmandic
Jun 6, 2023
Maintainer Author

vladmandic Jun 7, 2023
Maintainer Author

Aptronymist Jun 7, 2023
Collaborator

vladmandic Jun 7, 2023
Maintainer Author

vladmandic Jun 7, 2023
Maintainer Author

vladmandic Jun 7, 2023
Maintainer Author

vladmandic Jun 8, 2023
Maintainer Author

vladmandic Aug 9, 2023
Maintainer Author

Aptronymist Jun 17, 2023
Collaborator

vladmandic Jun 11, 2023
Maintainer Author

vladmandic Jun 11, 2023
Maintainer Author

vladmandic
Oct 28, 2023
Maintainer Author

vladmandic Nov 1, 2023
Maintainer Author

vladmandic Nov 1, 2023
Maintainer Author

vladmandic Nov 1, 2023
Maintainer Author

vladmandic Nov 2, 2023
Maintainer Author