Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK: SingleUseRefreshTokenOauth2Authenticator update config with access tokens and expiration date #20923

Merged
merged 11 commits into from
Jan 3, 2023

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Dec 29, 2022

What

Closes #20914

Before this PR the SingleUseRefreshTokenOauth2Authenticator eagerly performed access token refresh:

  • In each operation (check, discover, read) a new OAuth flow is started even if the previous access token is still valid.

As synchronous operation, like check on source creation, can't process control message, we can end up with invalid refresh token as the latest one are not persisted to the DB. (more details in https://github.com/airbytehq/airbyte-internal-issues/issues/1260)
To mitigate this problem we try to store the expiration date of the access token in the config and only perform refresh when the expiration date is met.

How

  • Read access_token_expiration_datetime from connector configuration
  • On access token refresh:
    • Update the access_token_value
    • Update the access_token_expiration_datetime

Recommended reading order

class SingleUseRefreshTokenOauth2Authenticator(Oauth2Authenticator):

🚨 User Impact 🚨

The connector using SingleUseRefreshTokenOauth2Authenticator should be reworked:
#7506

@octavia-squidington-iv octavia-squidington-iv added the CDK Connector Development Kit label Dec 29, 2022
@alafanechere alafanechere changed the title CDK: SingleUseRefreshTokenOauth2Authenticator update config with access tokens CDK: SingleUseRefreshTokenOauth2Authenticator update config with access tokens and expiration date Dec 29, 2022
@alafanechere alafanechere marked this pull request as ready for review December 29, 2022 18:05
@alafanechere alafanechere requested a review from a team as a code owner December 29, 2022 18:05
@alafanechere alafanechere requested review from a team and pedroslopez December 29, 2022 18:05
Copy link
Contributor

@pedroslopez pedroslopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice- main thing I think should be addressed is making sure that the initial access token is loaded from the config.

Should we always expect the expiration date to be part of the config or should we allow connectors to decide to use an an endpoint to retrieve the expiration date of the token?

Also, did you check the APIs for whether they had these kinds of endpoints like you mentioned? If so, I'm curious what the results of that investigation were.

Comment on lines 137 to 139
self._access_token_config_path = access_token_config_path
self._refresh_token_config_path = refresh_token_config_path
self._access_token_expiration_datetime_config_path = access_token_expiration_datetime_config_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also set the initial access_token from the config as part of the initialization - otherwise I believe we wouldn't have an access token and require it to refresh, right?

GitLab was doing this manually:

now = pendulum.now()
self.access_token = access_token
self.set_token_expiry_date(now.add(seconds=access_token_info["expires_in"]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change and added a test to ensure access_token attribute is set on init.

Comment on lines 200 to 204
# TODO alafanechere this will sequentially emit three control messages.
# We should rework the observer/config mutation logic if we want to have atomic config updates in a single control message.
dpath.util.set(self._connector_config, self._access_token_config_path, new_access_token)
dpath.util.set(self._connector_config, self._refresh_token_config_path, new_refresh_token)
dpath.util.set(self._connector_config, self._access_token_expiration_datetime_config_path, new_access_token_expiration_datetime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this means as currently implemented during the sync the platform will perform 3 separate config updates (api calls) for each of these pieces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discarded the use of ObservedDict and directly call a function from this method to emit the control message.

@alafanechere
Copy link
Contributor Author

Should we always expect the expiration date to be part of the config or should we allow connectors to decide to use an an endpoint to retrieve the expiration date of the token?

Yes, we should expect the expiration date to be part of the config if the SingleUseRefreshTokenOauth2Authenticator is directly used.
I think that sources having a dedicated endpoint to fetch expiration can override the token_has_expired method.

Also, did you check the APIs for whether they had these kinds of endpoints like you mentioned? If so, I'm curious what the results of that investigation were.

I did but did not find these endpoints for Quickbook, TypeForm, Jira and Pipedrive

@evantahler
Copy link
Contributor

This is good work! Do you happen to know how common it is for oAuth endpoints to return an expires_in / expires_at? If this is common, than storing this timestamp along with the token in the config is the way to go. Otherwise, perhaps we need to continue eagerly refreshing, but not in CHECK and DISCOVER modes (only READ and WRITE)

@alafanechere
Copy link
Contributor Author

Do you happen to know how common it is for oAuth endpoints to return an expires_in / expires_at?

I'm pretty sure returning an expires_in field after an access token refresh is part of the OAuth standard.
I double-checked that it's the case for the new connectors we want to support and they do all return this field in their responses:

✅ Quickbook
✅ Jira
✅ Typeform
✅ Pipedrive

@alafanechere
Copy link
Contributor Author

alafanechere commented Jan 3, 2023

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/3828092811
https://github.com/airbytehq/airbyte/actions/runs/3828092811

@alafanechere
Copy link
Contributor Author

alafanechere commented Jan 3, 2023

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/3828134324
https://github.com/airbytehq/airbyte/actions/runs/3828134324

@alafanechere alafanechere merged commit 254714b into master Jan 3, 2023
@alafanechere alafanechere deleted the augustin/cdk/oauth/lazy-refresh branch January 3, 2023 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDK: SingleUseRefreshTokenOauth2Authenticator refreshes tokens when necessary
4 participants