-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Github check connection for organizations with large number of re… #8170
Conversation
# In case of getting repository list for given organization was | ||
# successfull no need of checking stats for every repository within | ||
# that organization. | ||
repositories, _ = self._generate_repositories(config=config, authenticator=authenticator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if the user provides only organization repos (repositories would be empty?), then the
check_connection
doesn't check anything? - if repositories is a very long list of repo names, we still have the same issue?
Wouldn't it be better to limit the for loop at instead?
for stream_slice in repository_stats_stream.stream_slices(sync_mode=SyncMode.full_refresh): |
or at least limit to one public and one private from repositories, and maybe one for each organization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the user provides only organization repos (repositories would be empty?), then the check_connection doesn't check anything?
No, in this case it would try to get repo list from organization, pls look at unittest there is a case for that.
if repositories is a very long list of repo names, we still have the same issue?
Yes, but this was an issue with organization with a lot of repos within. In case if we get large list of repos it could be from different accounts and organizations and we should check all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to limit the for loop at instead? or at least limit to one public and one private from repositories, and maybe one for each organization?
In this case if user provide some private repo he has no access, check could falsely success. In this commit I check every single repo and organization (without checking repo within org cause "repo" scope should enable us to access every repo for that org). This fix making minimal sufficient requests to check that further read wont fail.
As an optimization we could group single repo by organization but this defect is not about long list of repos, its about organization with a lot of repos.
organizations = [org.split("/")[0] for org in repositories if org not in repositories_list] | ||
organisation_repos = set() | ||
if organizations: | ||
repos = Repositories(authenticator=authenticator, organizations=organizations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment that states that if we receive data about an organization, then there is no need to check for each of its repositories. And add this link: https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps#available-scopes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8a83d91
to
768a422
Compare
/publish connector=connectors/source-github
|
Resolves https://github.com/airbytehq/oncall/issues/19