Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Github check connection for organizations with large number of re… #8170

Merged
merged 3 commits into from
Nov 22, 2021

Conversation

avida
Copy link
Contributor

@avida avida commented Nov 22, 2021

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Nov 22, 2021
@avida avida temporarily deployed to more-secrets November 22, 2021 09:34 Inactive
@avida avida requested review from Zirochkaa, yevhenii-ldv, htrueman and sergei-solonitcyn and removed request for Zirochkaa November 22, 2021 09:34
# In case of getting repository list for given organization was
# successfull no need of checking stats for every repository within
# that organization.
repositories, _ = self._generate_repositories(config=config, authenticator=authenticator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if the user provides only organization repos (repositories would be empty?), then the check_connection doesn't check anything?
  • if repositories is a very long list of repo names, we still have the same issue?

Wouldn't it be better to limit the for loop at instead?

for stream_slice in repository_stats_stream.stream_slices(sync_mode=SyncMode.full_refresh):

or at least limit to one public and one private from repositories, and maybe one for each organization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the user provides only organization repos (repositories would be empty?), then the check_connection doesn't check anything?

No, in this case it would try to get repo list from organization, pls look at unittest there is a case for that.

if repositories is a very long list of repo names, we still have the same issue?

Yes, but this was an issue with organization with a lot of repos within. In case if we get large list of repos it could be from different accounts and organizations and we should check all of them.

Copy link
Contributor Author

@avida avida Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to limit the for loop at instead? or at least limit to one public and one private from repositories, and maybe one for each organization?

In this case if user provide some private repo he has no access, check could falsely success. In this commit I check every single repo and organization (without checking repo within org cause "repo" scope should enable us to access every repo for that org). This fix making minimal sufficient requests to check that further read wont fail.

As an optimization we could group single repo by organization but this defect is not about long list of repos, its about organization with a lot of repos.

organizations = [org.split("/")[0] for org in repositories if org not in repositories_list]
organisation_repos = set()
if organizations:
repos = Repositories(authenticator=authenticator, organizations=organizations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that states that if we receive data about an organization, then there is no need to check for each of its repositories. And add this link: https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps#available-scopes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avida
Copy link
Contributor Author

avida commented Nov 22, 2021

/publish connector=connectors/source-github

🕑 connectors/source-github https://github.com/airbytehq/airbyte/actions/runs/1490302841
✅ connectors/source-github https://github.com/airbytehq/airbyte/actions/runs/1490302841

@avida avida temporarily deployed to more-secrets November 22, 2021 13:11 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets November 22, 2021 13:12 Inactive
@avida avida merged commit c0cb8f3 into master Nov 22, 2021
@avida avida deleted the drezchykov/github-check branch November 22, 2021 13:34
@avida avida temporarily deployed to more-secrets November 22, 2021 13:36 Inactive
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants