You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
💡[feat] the request to add a feature that releases resources automatically in case of a timeout or if the GPU utilization falls below a certain threshold
#9555
Open
KyanChen opened this issue
Jun 24, 2024
· 4 comments
Please implement a functionality in both the command prompt (cmd) and the shell environment that allows for automatic resource release in the event of a timeout or if the GPU utilization is too low.
Describe the solution you'd like
Please implement a functionality in both the command prompt (cmd) and the shell environment that allows for automatic resource release in the event of a timeout or if the GPU utilization is too low.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
we do not have this as a built-in feature. One of our engineers developed an unofficial script which can terminate a process if it's not utilizing GPU. you can integrate it into your CMDs to do what you want. doing that with interactive shells or notebooks would probably be a bit trickier.
I cannot offer a reference implementation besides the script I've already shared.
your code looks fine at a quick glance. it does not seem like it's going to handle the case when you have multiple GPUs per node. but hey, if it works for you, I see no problem with you using it.
Describe the problem
Please implement a functionality in both the command prompt (cmd) and the shell environment that allows for automatic resource release in the event of a timeout or if the GPU utilization is too low.
Describe the solution you'd like
Please implement a functionality in both the command prompt (cmd) and the shell environment that allows for automatic resource release in the event of a timeout or if the GPU utilization is too low.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: