Queuing inhibits releasing of tasks #7396

fjetter · 2022-12-13T11:27:13Z

The transition logic for queued tasks is unstable and ordering dependent when releasing tasks.

Specifically, _exit_processing_common is popping things from the worker queues whenever something leaves the processing state, i.e. specifically also during the transition processing->released.

distributed/distributed/scheduler.py

Lines 3136 to 3139 in 7fb9c48

    
           for qts in self._next_queued_tasks_for_worker(ws): 
        
               if self.validate: 
        
                   assert qts.key not in recommendations, recommendations[qts.key] 
        
               recommendations[qts.key] = "processing"

recommendations are generated for a key to be transitioned into processing.

This new recommendation will then overwrite an earlier recommendation to release/forget this task

distributed/distributed/scheduler.py

Line 1959 in 7fb9c48

recommendations.update(new_recs)

s.t. the task is never forgotten but are in state processing instead.

Note: These tasks are technically in a corrupt state then because they do not have a TaskState.who_needs and they do not have any dependent with a who_needs but we're not checking this in validate_state which is why this issue didn't pop up, I believe (and I suggest to not introduce this. Walking the entire graph for every task is prohibitively expensive and will slow down our tests)

# minimal reproducer
@gen_cluster(client=True)
async def test_forget_tasks_while_processing(c, s, a, b):
    futures = c.map(inc, range(1000))
    await futures[0]
    await c.close()
    assert not s.tasks

# Original reproducer
@gen_cluster(client=True)
async def test_large_map_first_work(c, s, a, b):
    futures = c.map(inc, range(1000))
    async for _ in as_completed(futures):
        break
    await c.restart()

cc @gjoseph92

The text was updated successfully, but these errors were encountered:

fjetter · 2022-12-13T11:32:19Z

The impact is relatively mild. We are obviously recomputing some tasks again even though we shouldn't. Once the workers complete, the scheduler will tell them to release the task again. We will have "zombie" tasks in state released on the scheduler but workers will be clean.

When using restart, this situation can produce other follow up failures since we already cleaned up some other state, e.g. TaskPrefixes, see coiled/benchmarks#521 (comment)

gjoseph92 · 2022-12-13T16:08:57Z

This makes sense, but I'm curious why assert qts.key not in recommendations is not triggering in these cases. When I run test_forget_tasks_while_processing, I don't see that assertion fail. I'll look into it more.

I assume this means that there's a previous batch of recommendations which recommends a task to released, but while processing those recommendations (with a new, empty recommendations dict) we recommend the task to processing and overwrite the old batch. I'm wondering how _exit_processing_common can identify this situation, if state hasn't been changed for about the task in question yet.

fjetter · 2022-12-13T16:50:02Z

I'm concerned this is a more deeply rooted issue caused by us just updating the recommendation dict here

distributed/distributed/scheduler.py

Line 1959 in 7fb9c48

recommendations.update(new_recs)

Having key collisions here will inevitably cause problems. I guess we were lucky so far?

gjoseph92 · 2022-12-13T17:26:27Z

Yeah, I was also thinking that. You could argue recommendations might make more sense as a stack. I think we may implicitly rely on this behavior in too many places though.

Another thing is that maybe _next_queued_tasks_for_worker shouldn't be part of exit_processing_common. That is, maybe it shouldn't be in response to a task transitioning (because there's really no relationship between the task that exits processing and the task we pop off the queue). Instead, maybe we should do it in response to stimuli as a separate transitions cycle:

diff --git a/distributed/scheduler.py b/distributed/scheduler.py
index dbaa7cfa..153d6c80 100644
--- a/distributed/scheduler.py
+++ b/distributed/scheduler.py
@@ -5291,12 +5291,19 @@ class Scheduler(SchedulerState, ServerNode):
         recommendations, client_msgs, worker_msgs = r
         self._transitions(recommendations, client_msgs, worker_msgs, stimulus_id)
 
+        recommendations = self.stimulus_task_slot_opened(stimulus_id=stimulus_id)
+        self._transitions(recommendations, client_msgs, worker_msgs, stimulus_id)
+
         self.send_all(client_msgs, worker_msgs)
 
     def handle_task_erred(self, key: str, stimulus_id: str, **msg) -> None:
         r: tuple = self.stimulus_task_erred(key=key, stimulus_id=stimulus_id, **msg)
         recommendations, client_msgs, worker_msgs = r
         self._transitions(recommendations, client_msgs, worker_msgs, stimulus_id)
+
+        recommendations = self.stimulus_task_slot_opened(stimulus_id=stimulus_id)
+        self._transitions(recommendations, client_msgs, worker_msgs, stimulus_id)
+
         self.send_all(client_msgs, worker_msgs)
 
     def release_worker_data(self, key: str, worker: str, stimulus_id: str) -> None:

Lastly, by just adding this assertion we at least fail validation:

diff --git a/distributed/scheduler.py b/distributed/scheduler.py
index dbaa7cfa..7c6b763f 100644
--- a/distributed/scheduler.py
+++ b/distributed/scheduler.py
@@ -3088,6 +3088,7 @@ class SchedulerState:
         assert ts not in self.unrunnable
         assert ts not in self.queued
         assert all(dts.who_has for dts in ts.dependencies)
+        assert ts.who_wants or ts.waiters
 
     def _add_to_processing(self, ts: TaskState, ws: WorkerState) -> Msgs:
         """Set a task as processing on a worker and return the worker messages to send"""

So perhaps either:

We should check in _next_queued_tasks_for_worker that the task is actually valid to run (ts.who_wants or ts.waiters). If it's not, leave it alone; another transition should deal with it imminently
Check in _add_to_processing that the task should actually be run; if not, return None or something. All the transition_*_processing functions would then need to handle this case, and recommend the key to released if so.

Currently, I think I'm most in favor of the _next_queued_tasks_for_worker in response to stimuli option. It feels kind of logical that scheduling another queued task happens in response to stimulus, not as a side effect of a transition.

fjetter mentioned this issue Dec 13, 2022

scheduler.py:5692: AssertionError from client.restart() coiled/benchmarks#521

Closed

gjoseph92 self-assigned this Dec 13, 2022

gjoseph92 added the scheduling label Dec 13, 2022

This was referenced Dec 13, 2022

AssertionError in Scheduler.restart: assert not self.tasks #7398

Closed

Select queued tasks in stimuli, not transitions #7402

Merged

fjetter closed this as completed in #7402 Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queuing inhibits releasing of tasks #7396

Queuing inhibits releasing of tasks #7396

fjetter commented Dec 13, 2022

fjetter commented Dec 13, 2022 •

edited

Loading

gjoseph92 commented Dec 13, 2022

fjetter commented Dec 13, 2022

gjoseph92 commented Dec 13, 2022

Queuing inhibits releasing of tasks #7396

Queuing inhibits releasing of tasks #7396

Comments

fjetter commented Dec 13, 2022

fjetter commented Dec 13, 2022 • edited Loading

gjoseph92 commented Dec 13, 2022

fjetter commented Dec 13, 2022

gjoseph92 commented Dec 13, 2022

fjetter commented Dec 13, 2022 •

edited

Loading