Skip to content

Commit

Permalink
dm kcopyd: always complete failed jobs
Browse files Browse the repository at this point in the history
This patch fixes a problem in dm-kcopyd that may leave jobs in
complete queue indefinitely in the event of backing storage failure.

This behavior has been observed while running 100% write file fio
workload against an XFS volume created on top of a dm-zoned target
device. If the underlying storage of dm-zoned goes to offline state
under I/O, kcopyd sometimes never issues the end copy callback and
dm-zoned reclaim work hangs indefinitely waiting for that completion.

This behavior was traced down to the error handling code in
process_jobs() function that places the failed job to complete_jobs
queue, but doesn't wake up the job handler. In case of backing device
failure, all outstanding jobs may end up going to complete_jobs queue
via this code path and then stay there forever because there are no
more successful I/O jobs to wake up the job handler.

This patch adds a wake() call to always wake up kcopyd job wait queue
for all I/O jobs that fail before dm_io() gets called for that job.

The patch also sets the write error status in all sub jobs that are
failed because their master job has failed.

Fixes: b73c67c ("dm kcopyd: add sequential write feature")
Cc: [email protected]
Signed-off-by: Dmitry Fomichev <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
  • Loading branch information
dmitry-fomichev authored and snitm committed Aug 15, 2019
1 parent cf3591e commit d1fef41
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion drivers/md/dm-kcopyd.c
Original file line number Diff line number Diff line change
Expand Up @@ -566,8 +566,10 @@ static int run_io_job(struct kcopyd_job *job)
* no point in continuing.
*/
if (test_bit(DM_KCOPYD_WRITE_SEQ, &job->flags) &&
job->master_job->write_err)
job->master_job->write_err) {
job->write_err = job->master_job->write_err;
return -EIO;
}

io_job_start(job->kc->throttle);

Expand Down Expand Up @@ -619,6 +621,7 @@ static int process_jobs(struct list_head *jobs, struct dm_kcopyd_client *kc,
else
job->read_err = 1;
push(&kc->complete_jobs, job);
wake(kc);
break;
}

Expand Down

0 comments on commit d1fef41

Please sign in to comment.