Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Clear worker task lists on disable to ensure a consistent internal state.
Motivation and Context
One of the uncaught exceptions I was seeing in my scheduler recently was caused by an exception trying to peek at the worker list of a task with empty worker list during count_pending. I solved this in my fork by ignoring the exception, but I think I've now found the real cause.
After a worker is marked disabled, it will call get_work or count_pending one last time before stopping. This call can fail on count_pending when it has a unique task, as that task will now have no
workers but we try to look up the workers from each task.
Have you tested this? If so, how?
Included unit tests, but I haven't tested this particular fix in production. Should we catch the exception too to be safe?