Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres picks up timed out jobs before they time out #136

Open
jacwellington opened this issue Mar 30, 2017 · 1 comment
Open

Postgres picks up timed out jobs before they time out #136

jacwellington opened this issue Mar 30, 2017 · 1 comment

Comments

@jacwellington
Copy link

jacwellington commented Mar 30, 2017

When running with more than one worker process using the Postgres backend, it is possible for the following to happen:

  1. Worker 1 picks up a job.
  2. Worker 1 works until the specified timout period and then starts the timeout process.
  3. Worker 2 picks up the job BEFORE worker 1 finishes the timeout process.
  4. Worker 1 finishes the timeout process.

This can leave the database in a weird state, with multiple workers working on the same job even if only 1 maximum attempt was specified. This doesn't seem to be a problem in MySQL because of:

           # Removing the millisecond precision from now(time object)
            # MySQL 5.6.4 onwards millisecond precision exists, but the
            # datetime object created doesn't have precision, so discarded
            # while updating. But during the where clause, for mysql(>=5.6.4),
            # it queries with precision as well. So removing the precision
            now = now.change(usec: 0)

We've implemented a monkey-patch fix on our system that looks like the following:

# monkeypatch RM 12177 "Too many 'in_progress' requests"
# Intention:  Use an 'extended' max run time when generating the where clause for selecting the next available job.
# The extended max run time gives the application a small cushion of time to properly 'fail' those jobs that are
# expiring due to a 'timeout' situation.
module Delayed
  module Backend
    module ActiveRecord
      class Job
        class << self
          alias_method :original_reserve_method, :reserve
          def reserve(worker, max_run_time = Worker.max_run_time)
            extended_max_run_time = max_run_time + 2.seconds

            # run the original reserve method passing in the extended max run time value.
            original_reserve_method(worker, extended_max_run_time)
          end
        end
      end
    end
  end
end

I'm not sure how to implement this as a real patch, but I wanted to put this up there all the same.

@eric-hemasystems
Copy link

I also hit this issue any your solution I think will work. I did a slight different impl using prepend instead of alias_method as alias_method can sometimes cause problems in relation to code reloading, etc. prepend seems to be the preferred way to augment a method these days:

Delayed::Backend::ActiveRecord::Job.class_eval do
  module ReserveWithDelay
    def reserve worker, max_run_time = Delayed::Worker.max_run_time
      super worker, max_run_time + 2.seconds
    end
  end

  singleton_class.prepend ReserveWithDelay
end

But the overall ideal is still the same. Just different impl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants