Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: retry jobs with exponential backoff #66889

Merged
merged 2 commits into from
Aug 14, 2021

Commits on Aug 13, 2021

  1. jobs: retry jobs with exponential-backoff

    This commit adds a mechanism to retry jobs with exponentially increasing
    delays. This is achieved through two new columns in system.jobs table,
    last_run and num_runs. In addition, this commit adds cluster settings
    to control exponential-backoff parameters, initial delay and max delay,
    with corresponding settings `jobs.registry.retry.initial_delay` and
    `jobs.registry.retry.max_delay`. Finally, this commit adds a new
    partial-index in the jobs table that improves the performance of
    periodic queries run by registry in each node.
    
    Release note (general change): The behavior for retrying jobs, which fail
    due to a retriable error or due to job coordinator failure, is now delayed
    using exponential backoff. Before this change, jobs which failed in a
    retryable manner, would be resumed immediately on a different coordinator.
    This change reduces the impact of recurrently failing jobs on the cluster.
    This change adds two new cluster settings that control this behavior:
    "jobs.registry.retry.initial_delay" and "jobs.registry.retry.max_delay",
    which respectively control initial delay and maximum delay between resumptions.
    
    Fixes cockroachdb#44594
    Fixes cockroachdb#65080
    Sajjad Rizvi committed Aug 13, 2021
    Configuration menu
    Copy the full SHA
    be82c0e View commit details
    Browse the repository at this point in the history
  2. jobs: add a test to verify the use of partial index in registry queries

    This commit adds a test to verify that relevant queries in
    jobs registry use the new partial index.
    
    Release note: None
    Sajjad Rizvi committed Aug 13, 2021
    Configuration menu
    Copy the full SHA
    fa98ca1 View commit details
    Browse the repository at this point in the history