Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/issue 3350 snowflake non json response #3365

Merged
merged 7 commits into from
May 26, 2021

Conversation

matt-winkler
Copy link
Contributor

resolves #3350

Description

Runs a while loop for 1 second to validate whether content-type in the oauth response from Snowflake is JSON.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label May 17, 2021
@drewbanin
Copy link
Contributor

yo @matt-winkler appreciate you opening the PR! One thing I wanted to follow up on - do either you or @barberscott understand the root cause of this issue? Scott indicated that this could happen for "a variety of reasons", which is fair and true, but do we feel confident that retrying is going to be a successful strategy? It's possible that this query will fail repeatedly and actually:

  1. This will continue to not work
  2. We will have added a 20 second delay before showing an inscrutable error message

Dropping some code suggestions inline, but did want to better understand if we have a sense of the root cause at this point!

@matt-winkler
Copy link
Contributor Author

Yep that's good feedback @drewbanin - am looking to get something moving here based on the sketch Jeremy provided in the issue. Agree with root causing it further before merging. One comment is that we'd have a 1 second delay rather than 20 with this (50 ms per loop); let me know if I'm mentally off somewhere in this.

@barberscott
Copy link

barberscott commented May 18, 2021

Thoughts:

  • I think we should capture and log the non-JSON response so we can more likely understand root causes of this. I think any http error could cause it (e.g. 502) but we don't know and won't know.
  • Any kind of retry should probably be general cased and have a config setting for it. This is far from the only place we're establishing a database connection and if, for whatever reason, that fails, we should probably have a standard approach to retrying n times before failing, where n is a parameter perhaps on the target in profiles.yml

@drewbanin
Copy link
Contributor

@barberscott yeah! I'm with you. I think it would be ok if we omitted a config for this backoff/retry here - but I also think we should 100% log the status code of the response and the error message that comes back.

@matt-winkler think I skimmed the logic wrong - you're all good :). I'm going to take a pass and drop some comments in the code


result = requests.post(token_url, headers=headers, data=data)

if result.headers.get('content-type') == 'application/json':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the content-type is totally reasonable, but I think a more robust approach would try/catch the .json() method call. That might look like (pseudo code):

result_json = None
for i in range(max_iter):
  result = requests.post(token_url, headers=headers, data=data)
  try:
    result_json = result.json()
    break
  except ValueError as e:
    message = response.text() # or something like that
    logger.debug(f"Got a non-json response ({result.status}): {e}, message: {message}")
    time.sleep(0.05)

if result_json is None:
  # We'd want to raise an error here, otherwise `result_json['access_token']` below will fail
  pass

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right on. The only reason I recommended checking the content-type instead of try/except in my issue comment was from having read about the potential slowness of the exception-catching approach. On further reading/thinking, that should only matter for very large request responses, which this isn't.

@jtcohen6
Copy link
Contributor

Thanks for the help all! This change looks solid to me. I'm glad we're going to debug-log the non-JSON response, so we can try to get to the bottom of this.

@matt-winkler There are just some flake8 (code style) issues, and then I'd be happy to get this in before the next v0.20 prerelease:

plugins/snowflake/dbt/adapters/snowflake/connections.py:8:1: F401 'time.sleep' imported but unused
plugins/snowflake/dbt/adapters/snowflake/connections.py:131:9: E265 block comment should start with '# '
plugins/snowflake/dbt/adapters/snowflake/connections.py:142:100: E501 line too long (104 > 99 characters)
plugins/snowflake/dbt/adapters/snowflake/connections.py:143:17: F821 undefined name 'time'
plugins/snowflake/dbt/adapters/snowflake/connections.py:144:1: W293 blank line contains whitespace
plugins/snowflake/dbt/adapters/snowflake/connections.py:146:50: W291 trailing whitespace

@matt-winkler
Copy link
Contributor Author

Hey @jtcohen6 I updated the code after linting and did a test run using this large dbt project, during which I did not see connection errors. However, I'm not seeing a functional difference between running on the code in develop and in the branch in this PR. Both projects fail at the same point due to a macro config error and not an Oauth connection error, so it would be great if we can get some additional confirmation that this change does address the issue at hand.

@drewbanin
Copy link
Contributor

ok - if we're having a hard time reproducing this one locally, then i think we should just merge this one when we feel good about the change is. A retry & better logging never hurt anyone. Removing myself from review and will let y'all take it from here :)

@jtcohen6 jtcohen6 merged commit 0f018ea into develop May 26, 2021
@jtcohen6 jtcohen6 deleted the bugfix/issue-3350-snowflake-non-json-response branch May 26, 2021 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-JSON response from Snowflake connection attempt results is not handled
4 participants