Make default retry behaviour safer #505

tamaloa · 2022-01-31T11:58:02Z

Currently the default for retries is:

By default, GoodJob will automatically and immediately retry a job when an exception is raised to GoodJob.

The documentation on this behaviour is excellent and quite extensive and does explain nicely how to limit the number of retries using ActiveJob settings.

In our case we configured all Jobs (ApplicationJobs and MailerJobs) to retry a limited number of times. On final retry the exception is not raised but reportet to our exception tracker. We did NOT set GoodJob.retry_on_unhandled_error = false as it did not seem necessary (our Unit tests showed the configured number of retries for failing tests).

Unfortunately we use other gems which also enqueue jobs (in this case searchkick). These do NOT inherit from ApplicationJob but from ActiveJob::Base (which makes sense for libraries). But without configuring GoodJob.retry_on_unhandled_error = false this means these jobs are retried an infinite amount of times (immediately). This we only noticed after a couple million exceptions had reached our tracker (and almost exceeded our quota). After investigating this issue we furthermore had problems replicating the behaviour in test environment (thus even if we had thought of covering this corner with a test it would have not showed up).

I think, in most cases it is not a good idea to infinitely retry jobs (at least not immediately). This at the least introduces high load and in the worst case repeats actions visible to the outside (how about sending out emails). Therefore in my opinion it would be safer to set GoodJob.retry_on_unhandled_error = false by default. To enable retries one should then use ActiveJob settings to configure what is needed (maybe change documentation to limited number of retries with Infinite only as a additional note).

If this not what is intended with good_job (safe may also mean safe to not "miss" a job) I would suggest adding a warning in the documentation.

The text was updated successfully, but these errors were encountered:

bensheldon · 2022-01-31T15:56:37Z

@tamaloa thank you for opening this issue. It's on my mind and I intend to change the behavior in GoodJob 3.0. I just made a new issue (#507) to capture some other recent discussions in one place.

tamaloa · 2022-02-01T12:41:01Z

The new defaults (#507) seem a good choice (and incidentally what have now set).

For the time until 3.0 is released would you be open for a small documentation change warning against above problem? If so I could take a stab at it.

bensheldon · 2022-02-01T16:02:11Z

Please! Thank you.

tamaloa mentioned this issue Feb 9, 2022

Document safer setting for retry_on_unhandled_error #517

Merged

bensheldon closed this as completed in #517 Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make default retry behaviour safer #505

Make default retry behaviour safer #505

tamaloa commented Jan 31, 2022

bensheldon commented Jan 31, 2022

tamaloa commented Feb 1, 2022

bensheldon commented Feb 1, 2022

Make default retry behaviour safer #505

Make default retry behaviour safer #505

Comments

tamaloa commented Jan 31, 2022

bensheldon commented Jan 31, 2022

tamaloa commented Feb 1, 2022

bensheldon commented Feb 1, 2022