`ActionMailer::MailDeliveryJob` executing twice #329

tobyaherbert · 2021-08-12T16:53:27Z

I’ve recently noticed that some of the ActionMailer::MailDeliveryJob jobs enqueued using the deliver_later method appear to be executed twice despite being enqueued for delivery only once.

GoodJob is configured with the default number of threads and Rails is scheduling the ActionMailer::MailDeliveryJob on the :default queue.

After a bit of digging through the log file, it looks like GoodJob could be executing the same job multiple times concurrently as the same ActiveJob UUID appears on both occurrences of the delivery job’s output which each have a different Message-ID.

I’ve uploaded a simplified copy of the log file here from which I’ve deduced the following:

Lines 1-10 and 13-22 have the same job UUID.
Lines 27 and 44 show a different Message-ID, though they are both part of multiline output (starting on lines 24 and 41 respectively) which have the same job UUID.
Lines 33 and 49 show different GoodJob workers having executed the same job UUID.
Lines 35 and 51 show GoodJob deleting the same job UUID from its table which seems strange.

As a temporary fix, I’ve moved all ActionMailer::MailDeliveryJob jobs to a :mailer queue which has just one thread to prevent this from happening.

Could this be a bug in GoodJob or is there something else I should check?

Thanks!

The text was updated successfully, but these errors were encountered:

bensheldon · 2021-08-12T18:30:17Z

@tobyaherbert thanks for reporting this and sharing the logs. This is unexpected.

Do you have the logs preceding slightly preceding what you shared?

On Line 11 and 12, there are these 2 lines, which is one GoodJob thread/worker fetching and locking the database record, and then verifying that it does have a lock. I only see this line once, and I'm curious if that is in the earlier in the logs of the other worker also fetching and verifying the lock:

11  GoodJob::Job Load (7.0ms)  SELECT "good_jobs".* FROM "good_jobs" WHERE "good_jobs"."id" IN (WITH "rows" AS MATERIALIZED ( SELECT "good_jobs"."id" FROM "good_jobs" WHERE "good_jobs"."queue_name" = 'default' AND "good_jobs"."finished_at" IS NULL AND ("good_jobs"."scheduled_at" <= '2021-08-11 14:59:32.115185' OR "good_jobs"."scheduled_at" IS NULL) ORDER BY priority DESC NULLS LAST ) SELECT "rows"."id" FROM "rows" WHERE pg_try_advisory_lock(('x' || substr(md5('good_jobs' || "rows"."id"::text), 1, 16))::bit(64)::bigint) LIMIT $1) ORDER BY priority DESC NULLS LAST  [["LIMIT", 1]]
12  GoodJob::Job Exists? (20.3ms)  SELECT 1 AS one FROM "good_jobs" LEFT JOIN pg_locks ON pg_locks.locktype = 'advisory' AND pg_locks.objsubid = 1 AND pg_locks.classid = ('x' || substr(md5('good_jobs' || "good_jobs"."id"::text), 1, 16))::bit(32)::int AND pg_locks.objid = (('x' || substr(md5('good_jobs' || "good_jobs"."id"::text), 1, 16))::bit(64) << 32)::bit(32)::int WHERE "good_jobs"."finished_at" IS NULL AND ("pg_locks"."pid" = pg_backend_pid()) AND "good_jobs"."id" = $1 LIMIT $2  [["id", "33091f18-02c1-491c-8aff-a267140558f9"], ["LIMIT", 1]]

I don't have an explanation, but I can walk you through my thinking in case it spurs some insight:

Rails will checkout a dedicated database connection for each thread.
GoodJob expects each thread/worker to have its own dedicated database connection, which it uses to create connection-level advisory locks.
The only way I can imagine that two threads/workers would be able to execute a single GoodJob Job (e.g. a record in the good_jobs table) at the same time would be if the threads/workers were sharing a database connection, or if a database connection in one thread/worker was closed/dropped in the middle of executing the job.

Could you confirm what version of Ruby and Rails you're using? And also whether you're using PgBouncer (a database connection pooler). And are you seeing this locally too, or just in production? Strange!

tobyaherbert · 2021-08-12T18:49:05Z

@bensheldon Thanks for the prompt reply!

Yep, I've got more logs from that period - Do you have somewhere I can send you a private link to see them?

The project is using Rails 6.1.4 with Ruby 3.0.2.
I haven't used PgBouncer and am using PostgreSQL server version 12.

Here's a snapshot of the database.yml file:

default: &default
  adapter: postgresql
  encoding: unicode
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 10 } %>
  timeout: 5000

test:
  <<: *default
  host: localhost
  port: 7433
  username: rails
  password: rails
  database: rails_test

development:
  <<: *default
  host: localhost
  port: 7432
  username: rails
  password: rails
  database: rails_dev

staging:
  <<: *default
  database: rails_staging

production:
  <<: *default
  database: rails_production

Thanks again!

bensheldon · 2021-08-12T18:50:04Z

Happy to help. You can email them to me: [email protected]

bensheldon · 2021-08-13T01:18:02Z

Thanks for sending over those logs. I don't have an explanation for why this has happened.

There isn't anything special about ActionMailer::MailDeliveryJob.

I haven't observed this happen before, and there is GoodJob test code that create a big concurrency load to surface these kinds of problems. Some suggestions:

Log the output of ActiveRecord::Base.connection.object_id to see if the same database ID is being shared across threads.
Rails has a mechanism for sharing database connections across threads for tests. You can see if this is enabled: ActiveRecord::Base.instance_variable_get(:@lock_thread) #=> should be nil/false. This code replaced a much older trick but it doesn't look like you're working in a legacy codebase.

I am stumped 🤔

bensheldon · 2021-09-30T15:07:37Z

@tobyaherbert I think I may have duplicated what you're seeing in #388. which is related to code reloading in the Development environment.

I double-checked the logs you sent me, and they were for Development too. I just wanted to doublecheck that this behavior wasn't seen in the Production environment (because that would imply something different than #388).

tobyaherbert · 2021-09-30T15:37:35Z

@bensheldon I got sidetracked and completely forgot to respond to you, apologies!

Yes, we were testing our email sending feature in the development environment when we encountered the duplications.
We switched to using serial queues for most things which has mitigated the issues we were having (though I never tried it with more than one thread in the production environment).

bensheldon · 2021-09-30T18:47:50Z

@tobyaherbert I totally understand. I have a fix for the problem released in v2.3.1 if you want to give that a try.

tobyaherbert · 2021-10-01T08:50:22Z

@bensheldon Ah, that's great - I'll take a peek when I get a sec 👍

bensheldon · 2022-04-19T20:00:22Z

@tobyaherbert I'm closing this because of age. Please comment if you're still experiencing problems.

bensheldon added the bug Something isn't working label Aug 16, 2021

bensheldon mentioned this issue Sep 29, 2021

"WARNING: you don't own a lock of type ExclusiveLock" in Development #388

Closed

bensheldon closed this as completed Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ActionMailer::MailDeliveryJob` executing twice #329

`ActionMailer::MailDeliveryJob` executing twice #329

tobyaherbert commented Aug 12, 2021 •

edited

Loading

bensheldon commented Aug 12, 2021 •

edited

Loading

tobyaherbert commented Aug 12, 2021

bensheldon commented Aug 12, 2021

bensheldon commented Aug 13, 2021

bensheldon commented Sep 30, 2021

tobyaherbert commented Sep 30, 2021

bensheldon commented Sep 30, 2021

tobyaherbert commented Oct 1, 2021

bensheldon commented Apr 19, 2022

ActionMailer::MailDeliveryJob executing twice #329

ActionMailer::MailDeliveryJob executing twice #329

Comments

tobyaherbert commented Aug 12, 2021 • edited Loading

bensheldon commented Aug 12, 2021 • edited Loading

tobyaherbert commented Aug 12, 2021

bensheldon commented Aug 12, 2021

bensheldon commented Aug 13, 2021

bensheldon commented Sep 30, 2021

tobyaherbert commented Sep 30, 2021

bensheldon commented Sep 30, 2021

tobyaherbert commented Oct 1, 2021

bensheldon commented Apr 19, 2022

`ActionMailer::MailDeliveryJob` executing twice #329

`ActionMailer::MailDeliveryJob` executing twice #329

tobyaherbert commented Aug 12, 2021 •

edited

Loading

bensheldon commented Aug 12, 2021 •

edited

Loading