Postgres picks up timed out jobs before they time out #136

jacwellington · 2017-03-30T15:51:04Z

When running with more than one worker process using the Postgres backend, it is possible for the following to happen:

Worker 1 picks up a job.
Worker 1 works until the specified timout period and then starts the timeout process.
Worker 2 picks up the job BEFORE worker 1 finishes the timeout process.
Worker 1 finishes the timeout process.

This can leave the database in a weird state, with multiple workers working on the same job even if only 1 maximum attempt was specified. This doesn't seem to be a problem in MySQL because of:

           # Removing the millisecond precision from now(time object)
            # MySQL 5.6.4 onwards millisecond precision exists, but the
            # datetime object created doesn't have precision, so discarded
            # while updating. But during the where clause, for mysql(>=5.6.4),
            # it queries with precision as well. So removing the precision
            now = now.change(usec: 0)

We've implemented a monkey-patch fix on our system that looks like the following:

# monkeypatch RM 12177 "Too many 'in_progress' requests"
# Intention:  Use an 'extended' max run time when generating the where clause for selecting the next available job.
# The extended max run time gives the application a small cushion of time to properly 'fail' those jobs that are
# expiring due to a 'timeout' situation.
module Delayed
  module Backend
    module ActiveRecord
      class Job
        class << self
          alias_method :original_reserve_method, :reserve
          def reserve(worker, max_run_time = Worker.max_run_time)
            extended_max_run_time = max_run_time + 2.seconds

            # run the original reserve method passing in the extended max run time value.
            original_reserve_method(worker, extended_max_run_time)
          end
        end
      end
    end
  end
end

I'm not sure how to implement this as a real patch, but I wanted to put this up there all the same.

eric-hemasystems · 2023-05-17T15:38:28Z

I also hit this issue any your solution I think will work. I did a slight different impl using prepend instead of alias_method as alias_method can sometimes cause problems in relation to code reloading, etc. prepend seems to be the preferred way to augment a method these days:

Delayed::Backend::ActiveRecord::Job.class_eval do
  module ReserveWithDelay
    def reserve worker, max_run_time = Delayed::Worker.max_run_time
      super worker, max_run_time + 2.seconds
    end
  end

  singleton_class.prepend ReserveWithDelay
end

But the overall ideal is still the same. Just different impl.

eric-hemasystems mentioned this issue May 17, 2023

Job running on 2 workers at once after timeout #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres picks up timed out jobs before they time out #136

Postgres picks up timed out jobs before they time out #136

jacwellington commented Mar 30, 2017 •

edited

Loading

eric-hemasystems commented May 17, 2023

Postgres picks up timed out jobs before they time out #136

Postgres picks up timed out jobs before they time out #136

Comments

jacwellington commented Mar 30, 2017 • edited Loading

eric-hemasystems commented May 17, 2023

jacwellington commented Mar 30, 2017 •

edited

Loading