Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobManager: add resilience for backend failures #365

Closed
jdries opened this issue Jan 15, 2023 · 1 comment · Fixed by #373
Closed

JobManager: add resilience for backend failures #365

jdries opened this issue Jan 15, 2023 · 1 comment · Fixed by #373
Assignees

Comments

@jdries
Copy link
Collaborator

jdries commented Jan 15, 2023

Backends have all kind of intermittent failures that go away over time. This usually means they send 50x errors.
Especially for the job manager, it can be better to keep trying until the backend works again. The python requests library can be configured to do this.
This way, long running tasks are more resilient.

@jdries
Copy link
Collaborator Author

jdries commented Jan 20, 2023

FYI, my current approach to add retries in connection.py:


        from requests.adapters import HTTPAdapter, Retry
        retries = Retry(total=5,read=50,other=50,status=50,
                        backoff_factor=0.1,
                        status_forcelist=[ 502, 503, 504, 404],
                        method_whitelist=["HEAD", "GET", "OPTIONS","POST"])
        self.session.mount('https://', HTTPAdapter(max_retries=retries))
        self.session.mount('http://', HTTPAdapter(max_retries=retries))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants