You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding a new job to the queue for a particular device when that device is 1 job below the max_jobs_per_worker limit (f.ex, it has completed 9 jobs with a limit of 10), when that worker is already running a job, can spawn a second competing worker that tries to use the same device.
It looks like the worker.join() call times out, and there is no fallback to .terminate() because that was breaking the queues on Windows. The pool.recycle() method otherwise handles everything well and spawns a new worker, which shares the device until they both run out of VRAM.
The text was updated successfully, but these errors were encountered:
I've tried a half-dozen different methods for having the server process kill the workers, but short of using .terminate(), none of them seem to be effective.
What does work is having the worker exit() itself when it is no longer the primary worker for that device. That's easy enough to track in the server process with a Value containing the PID of the primary/only worker for that device. Before the worker starts each job, it checks to make sure that it is the current worker, and exits if not.
There can be a brief memory leak while the two workers co-exist, but the older worker reliably exits, which frees its memory and leaves it for the newer one.
Adding a new job to the queue for a particular device when that device is 1 job below the max_jobs_per_worker limit (f.ex, it has completed 9 jobs with a limit of 10), when that worker is already running a job, can spawn a second competing worker that tries to use the same device.
It looks like the
worker.join()
call times out, and there is no fallback to.terminate()
because that was breaking the queues on Windows. Thepool.recycle()
method otherwise handles everything well and spawns a new worker, which shares the device until they both run out of VRAM.The text was updated successfully, but these errors were encountered: