adding a new job to the queue when a busy worker is near the max job limit can spawn a second competing worker #219

ssube · 2023-03-06T00:29:10Z

Adding a new job to the queue for a particular device when that device is 1 job below the max_jobs_per_worker limit (f.ex, it has completed 9 jobs with a limit of 10), when that worker is already running a job, can spawn a second competing worker that tries to use the same device.

It looks like the worker.join() call times out, and there is no fallback to .terminate() because that was breaking the queues on Windows. The pool.recycle() method otherwise handles everything well and spawns a new worker, which shares the device until they both run out of VRAM.

The text was updated successfully, but these errors were encountered:

ssube · 2023-03-06T13:37:24Z

I've tried a half-dozen different methods for having the server process kill the workers, but short of using .terminate(), none of them seem to be effective.

What does work is having the worker exit() itself when it is no longer the primary worker for that device. That's easy enough to track in the server process with a Value containing the PID of the primary/only worker for that device. Before the worker starts each job, it checks to make sure that it is the current worker, and exits if not.

There can be a brief memory leak while the two workers co-exist, but the older worker reliably exits, which frees its memory and leaves it for the newer one.

ssube added status/new issues that have not been confirmed yet type/bug broken features scope/api labels Mar 6, 2023

ssube added this to the v0.9 milestone Mar 6, 2023

ssube added a commit that referenced this issue Mar 6, 2023

fix(api): track and repeatedly attempt to recycle leaking workers (#219)

7a3a81a

ssube modified the milestones: v0.9, v0.8 Mar 6, 2023

ssube added status/progress issues that are in progress and have a branch and removed status/new issues that have not been confirmed yet labels Mar 6, 2023

ssube added status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Mar 6, 2023

ssube closed this as completed Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding a new job to the queue when a busy worker is near the max job limit can spawn a second competing worker #219

adding a new job to the queue when a busy worker is near the max job limit can spawn a second competing worker #219

ssube commented Mar 6, 2023

ssube commented Mar 6, 2023 •

edited

Loading

adding a new job to the queue when a busy worker is near the max job limit can spawn a second competing worker #219

adding a new job to the queue when a busy worker is near the max job limit can spawn a second competing worker #219

Comments

ssube commented Mar 6, 2023

ssube commented Mar 6, 2023 • edited Loading

ssube commented Mar 6, 2023 •

edited

Loading