Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry deps requests made to the hub site #1451

Closed
drewbanin opened this issue May 9, 2019 · 2 comments · Fixed by #1491
Closed

Retry deps requests made to the hub site #1451

drewbanin opened this issue May 9, 2019 · 2 comments · Fixed by #1491
Labels
enhancement New feature or request

Comments

@drewbanin
Copy link
Contributor

drewbanin commented May 9, 2019

Feature

Feature description

About once per month, we can see that many dbt deps invocations fail all at the same time. This happens because of some sort of intermittent error with the hub site host. While it may be worth taking action to understand and improve the uptime of the hub site, it's also a good idea to add retries to these requests.

In the registry._get method, dbt should retry any request that fails 1) without producing a response code or 2) that fails with a 5xx response code.

Since many dbt jobs run at specific wall clock times (like midnight UTC), we should randomize the timeout between retries to avoid a thundering herd scenario.

  • After the first failure, dbt should wait 5-10 seconds before retrying.
  • If the request fails again, dbt should wait 5-10 seconds again.
  • If that request fails, then dbt should raise the resulting exception

@beckjake @cmcarthur you guys have more experience with this class of problem than I do -- is this a reasonable solution? Would you recommend a different approach for the timeouts?

@drewbanin drewbanin added the dependencies Changes to the version of dbt dependencies label May 9, 2019
@beckjake
Copy link
Contributor

beckjake commented May 9, 2019

That all sounds reasonable enough to me. I assume the fail counter is per-attempt?

@drewbanin drewbanin added enhancement New feature or request and removed dependencies Changes to the version of dbt dependencies labels May 9, 2019
@drewbanin
Copy link
Contributor Author

Yeah - i think that's right. dbt makes a couple of types of requests to the hub site:

  • get the index
  • get the versions for a specific package
  • get the contents for a specific version of a package

We should assume that any of these types of queries can fail, and the fail counter is indeed per-attempt/request.

@drewbanin drewbanin added this to the Wilt Chamberlain milestone May 29, 2019
beckjake added a commit that referenced this issue May 30, 2019
…ries

add a retry + sleep loop to registry calls (#1451)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants