Skip to content

Error Handling & Retries

akshat edited this page Sep 18, 2024 · 8 revisions

When a Job throws an uncaught exception during execution, Goose treats that as a failure. There are multiple configurations to handle a Failed Job as it goes through these steps:

  1. Job's error-handler is called
  2. If retries are remaining, Job is scheduled for retry with backoff
  3. Failed jobs can be executed from a different retry-queue as well
  4. If retries are exhausted, death-handler is called & Job is marked as dead
  5. Upon death, Job is stored in Dead Jobs queue
  6. Storage of Dead Jobs can be skipped as well using a config

Error Handlers

Since Goose jobs run in background, it is considered a good practice to integrate error services like Sentry, Honeybadger, etc. as error handlers.

Error & Death handlers are fully-qualified function symbols that accept config, job & exception. They must be configured by Client & will be called during Job failure by Worker.

  • :error-handler-fn-sym Called when a job has failed, and will be scheduled for execution
  • :death-handler-fn-sym Called when a job has exhausted retries and won't be executed again

Honeybadger Usage

(ns honeybadger
  (:require
    [goose.client :as c]
    [goose.retry :as retry]
    [goose.worker :as w]
    [honeybadger.core :as hb]))

(defn hb-error-handler
  [cfg job ex]
  (hb/notify cfg ex job))

; Add Honeybadger error handler.
(let [retry-opts (assoc retry/default-opts :error-handler-fn-sym `hb-error-handler)
      client-opts (assoc client-opts :retry-opts retry-opts)]
  (c/perform-async client-opts `my-failing-fn :foo))

; Inject Honeybadger config in Worker.
(let [hb-config {:api-key "d34db33f"
                 :env     "development"}]
  (w/start (assoc worker-opts :error-service-cfg hb-config)))

Sentry Usage

(ns sentry
  (:require
    [goose.client :as c]
    [goose.worker :as w]
    [sentry-clj.core :as sentry]))

; Ignore first arg as Sentry is pre-initialized.
(defn sentry-death-handler
  [_ job ex]
  (sentry/send-event
    {:message   (str "Job died: " (:id job))
     :throwable ex}))

; Add Sentry as death handler.
(let [retry-opts (assoc retry/default-opts :death-handler-fn-sym `sentry-death-handler)
      client-opts (assoc client-opts :retry-opts retry-opts)]
  (c/perform-async client-opts `my-dying-fn :foo))

; Init Sentry config.
(sentry/init! "https://public:[email protected]/1")
; No need to inject sentry config in worker.
(w/start w/default-opts)

Different retry queue

To prevent main queue from getting clogged by failed jobs, a different queue can be configured using :retry-queue option.

Max Retries

By default, Goose retries a job 27 times; and can be modified by :max-retries option.

Retry backoff

Goose will retry failures with an exponential backoff using the formula (retry_count ** 4) + 20 + (rand(20) * (retry_count + 1)) (i.e. , 28, 51, 66, 177, ... seconds). Goose will perform 27 retries over approximately 30 days. Assuming new code gets deployed & bug gets fixed within that time, the job will get automatically retried and successfully processed. After 27 times, Goose will move that job to the Dead Job queue, assuming that it will need manual intervention to work.

Retry delay can be modified using :retry-delay-sec-fn-sym option.

Disable dead-jobs

When retries are exhausted, Job won't be stored in Dead Jobs queue when :skip-dead-queue is set to true.

Dead jobs can deleted or replayed using API

Process Crashes

Sometimes, workers might crash abruptly & in-progress Jobs might not be completed. Such abandoned Jobs are called orphan-jobs & will be picked up by another worker process. Jobs must be Idempotent for such scenarios.

Usage

(ns error-handling
  (:require
    [goose.client :as c]
    [goose.worker :as w]
    [clojure.tools.logging :as log]))

(def error-service-config {:my :config})
(defn my-error-handler [cfg job ex]
  (log/error cfg job ex))
(defn my-death-handler [cfg job ex]
  (log/error cfg job ex))

(defn my-retry-delay
  [retry-count]
  (+ (* (rand-int 30) (inc retry-count))
     (reduce * (repeat 2 retry-count)))) ; retry-count^2

(let [retry-opts {:max-retries            10
                  :retry-delay-sec-fn-sym `my-retry-delay
                  :retry-queue            "my-retry-queue"
                  :error-handler-fn-sym   `my-error-handler
                  :death-handler-fn-sym   `my-death-handler
                  :skip-dead-queue        false}
      client-opts (assoc client-opts :retry-opts retry-opts)]
  ;; Retry options can be configured for scheduled jobs too.
  (c/perform-in-sec client-opts 300 `my-failing-fn :foo))

(let [worker-opts (assoc worker-opts :error-service-config error-service-config)

      worker (w/start worker-opts)
      failed-jobs-worker (w/start (assoc worker-opts :threads 2
                                                     :queue "my-retry-queue"))])

Previous: Cron Jobs        Next: Monitoring & Alerting