-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VRAM is not freed on errors #303
Labels
bug
Something isn't working
downstream-fix
Likely requires fixing another project downstream
enhancement
New feature or request
Comments
Thanks for the tip, the models should definitely free memory on failure. I
think some of them would free it on next load, but that's not ideal.
As for the issue itself, do you see any error messages in the console? I
don't think musicgen should be doing telemetry and I have disabled Gradio
telemetry. (By the way I have almost no idea what people are *actually*
using.)
…On Thu, Apr 11, 2024, 8:48 AM rofoto ***@***.***> wrote:
When using musicgen the process completes and all files are created, but I
have blocked network traffic out and this causes an error when ( i assume )
musicgen tries to send out telemetry.
This error does not effect the outputs but it also puts the GPU in a state
where VRAM is not freed, forcing a restart.
This is not the only error that puts the GPU in this state. It appears
that pretty much any error, including but not limited to;
torch.cuda.OutOfMemoryError and errors when trying to download models puts
the GPU in this state.
—
Reply to this email directly, view it on GitHub
<#303>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTRXI2F7QKQBFO7GKNVSJLY4XFTHAVCNFSM6AAAAABGBKBVEKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTMNRRGAYTMNI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
this is the only other error I am seeing
That's why I thought telemetry was the potential issue.
|
That error is because you still have a frontend and a server. This is
generally from gradio. Although it's not impossible that this could from
telemetry at some point, it's a different situation.
…On Fri, Apr 12, 2024, 2:59 PM rofoto ***@***.***> wrote:
this is the only other error I am seeing
"tts-6.0_webui\installer_files\env\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host.
That's why I thought telemetry was the potential issue.
I think some of them would free it on next load, but that's not ideal.
After looking, it would appear that in some situations the vram is freed
on the next run but ideally it can be cleared at the end of generation,
just in case.
—
Reply to this email directly, view it on GitHub
<#303 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTRXI2VIKMSGVOLN36M7OTY45Z43AVCNFSM6AAAAABGBKBVEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJRGAZTKMJVGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
rsxdalv
added
bug
Something isn't working
enhancement
New feature or request
downstream-fix
Likely requires fixing another project downstream
labels
Sep 11, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
downstream-fix
Likely requires fixing another project downstream
enhancement
New feature or request
When using musicgen the process completes and all files are created, but I have blocked network traffic out and this causes an error. If you enable multi band after this, there is a chance that there will not be enough vram for it.
Not sure this is intended behavior since there seems to be a delayed cleanup between runs.
This error does not effect the outputs but it also puts the GPU in a state where VRAM is not freed, forcing a restart.
This is not the only error that puts the GPU in this state. It appears that pretty much any error, including but not limited to;
torch.cuda.OutOfMemoryError and errors when trying to download models puts the GPU in this state.
The text was updated successfully, but these errors were encountered: