Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make default retry behaviour safer #505

Closed
tamaloa opened this issue Jan 31, 2022 · 3 comments · Fixed by #517
Closed

Make default retry behaviour safer #505

tamaloa opened this issue Jan 31, 2022 · 3 comments · Fixed by #517

Comments

@tamaloa
Copy link
Contributor

tamaloa commented Jan 31, 2022

Currently the default for retries is:

By default, GoodJob will automatically and immediately retry a job when an exception is raised to GoodJob.

The documentation on this behaviour is excellent and quite extensive and does explain nicely how to limit the number of retries using ActiveJob settings.

In our case we configured all Jobs (ApplicationJobs and MailerJobs) to retry a limited number of times. On final retry the exception is not raised but reportet to our exception tracker. We did NOT set GoodJob.retry_on_unhandled_error = false as it did not seem necessary (our Unit tests showed the configured number of retries for failing tests).

Unfortunately we use other gems which also enqueue jobs (in this case searchkick). These do NOT inherit from ApplicationJob but from ActiveJob::Base (which makes sense for libraries). But without configuring GoodJob.retry_on_unhandled_error = false this means these jobs are retried an infinite amount of times (immediately). This we only noticed after a couple million exceptions had reached our tracker (and almost exceeded our quota). After investigating this issue we furthermore had problems replicating the behaviour in test environment (thus even if we had thought of covering this corner with a test it would have not showed up).

I think, in most cases it is not a good idea to infinitely retry jobs (at least not immediately). This at the least introduces high load and in the worst case repeats actions visible to the outside (how about sending out emails). Therefore in my opinion it would be safer to set GoodJob.retry_on_unhandled_error = false by default. To enable retries one should then use ActiveJob settings to configure what is needed (maybe change documentation to limited number of retries with Infinite only as a additional note).

If this not what is intended with good_job (safe may also mean safe to not "miss" a job) I would suggest adding a warning in the documentation.

@bensheldon
Copy link
Owner

@tamaloa thank you for opening this issue. It's on my mind and I intend to change the behavior in GoodJob 3.0. I just made a new issue (#507) to capture some other recent discussions in one place.

@tamaloa
Copy link
Contributor Author

tamaloa commented Feb 1, 2022

The new defaults (#507) seem a good choice (and incidentally what have now set).

For the time until 3.0 is released would you be open for a small documentation change warning against above problem? If so I could take a stab at it.

@bensheldon
Copy link
Owner

Please! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants