Clear tasks on disabled workers #2208

daveFNbuck · 2017-08-11T22:36:32Z

Description

Clear worker task lists on disable to ensure a consistent internal state.

Motivation and Context

One of the uncaught exceptions I was seeing in my scheduler recently was caused by an exception trying to peek at the worker list of a task with empty worker list during count_pending. I solved this in my fork by ignoring the exception, but I think I've now found the real cause.

After a worker is marked disabled, it will call get_work or count_pending one last time before stopping. This call can fail on count_pending when it has a unique task, as that task will now have no
workers but we try to look up the workers from each task.

Have you tested this? If so, how?

Included unit tests, but I haven't tested this particular fix in production. Should we catch the exception too to be safe?

After a worker is marked disabled, it will call get_work or count_pending one last time before stopping. This call can fail on count_pending when it has a unique task, as that task will now have no workers but we try to look up the workers from each task. To guarantee a more consistent state, we now clear the task list from each worker on disable.

daveFNbuck · 2017-08-12T00:21:04Z

Looks like one of the e-mail tests might be flaky :(

Tarrasch

Looks good. I'm ready to merge whenever you feel comfortable. If you don't think it's necessary to test in production I can merge now.

daveFNbuck · 2017-08-12T17:46:59Z

I think this is ok to merge now. I did add it to production with some extra logging to help figure out if there are other causes for this and one other bug I still don't know any cause for. Hopefully I'll have at least one more fix on Monday.

Tarrasch · 2017-08-13T08:44:17Z

Thanks @daveFNbuck ! :)

Tarrasch approved these changes Aug 12, 2017

View reviewed changes

Tarrasch merged commit 876fe69 into spotify:master Aug 13, 2017

This was referenced Jun 29, 2022

no mo enum 34 #3180

Closed

enum34 be gone #3181

Closed

mdragilev mentioned this pull request Jun 28, 2024

for S3 contrib package move to boto3 Affirm/luigi#26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear tasks on disabled workers #2208

Clear tasks on disabled workers #2208

daveFNbuck commented Aug 11, 2017

daveFNbuck commented Aug 12, 2017

Tarrasch left a comment

daveFNbuck commented Aug 12, 2017

Tarrasch commented Aug 13, 2017

Clear tasks on disabled workers #2208

Clear tasks on disabled workers #2208

Conversation

daveFNbuck commented Aug 11, 2017

Description

Motivation and Context

Have you tested this? If so, how?

daveFNbuck commented Aug 12, 2017

Tarrasch left a comment

Choose a reason for hiding this comment

daveFNbuck commented Aug 12, 2017

Tarrasch commented Aug 13, 2017