You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ProjectDetailView uses the concurrent Python module to parallelise Github API requests through github.get_repo_is_private(). This is unnecessary and over-complicated and makes reasoning about and maintaining the code more complicated than necessary (ref #4591). It's also inefficient to open a separate connection for each item as the connection and API endpoint overheads are higher than a bulk request.
The more typical approach would be for the endpoint to provide a method to do a bulk request with a single API access (possibly paginated/chunked if needed due to a max size limit). If we look in the github module (our thin-wrapper around requests and the GitHub API), we can see that functions like get_repos_with_branches and get_repos_with_dates follow this approach with GraphQL API accesses. We can write a new get_repos_with_privacy or whatever function and eliminate this idiosyncratic use of concurrency.
Generally, we should avoid using concurrency unless we need it to avoid over-complication. This is the only use of the concurrent module across the entire repo. It looks like it was done for expediency or as an experiment. The other two clients of github.get_repo_is_private() (WorkspaceDetail and SignOffRepo views) use this function for a single request, which is reasonable.
We might consider refactoring the commonality of the GraphQL code in github.py alongside doing this. Or put that in a separate issue. That might take inspiration from the metrics codebase approach but note this is a CLI-tool so there may be some differences in desired behavior.
The text was updated successfully, but these errors were encountered:
The
ProjectDetailView
uses theconcurrent
Python module to parallelise Github API requests throughgithub.get_repo_is_private()
. This is unnecessary and over-complicated and makes reasoning about and maintaining the code more complicated than necessary (ref #4591). It's also inefficient to open a separate connection for each item as the connection and API endpoint overheads are higher than a bulk request.The more typical approach would be for the endpoint to provide a method to do a bulk request with a single API access (possibly paginated/chunked if needed due to a max size limit). If we look in the
github
module (our thin-wrapper aroundrequests
and the GitHub API), we can see that functions likeget_repos_with_branches
andget_repos_with_dates
follow this approach with GraphQL API accesses. We can write a newget_repos_with_privacy
or whatever function and eliminate this idiosyncratic use of concurrency.Generally, we should avoid using concurrency unless we need it to avoid over-complication. This is the only use of the
concurrent
module across the entire repo. It looks like it was done for expediency or as an experiment. The other two clients ofgithub.get_repo_is_private()
(WorkspaceDetail
andSignOffRepo
views) use this function for a single request, which is reasonable.We might consider refactoring the commonality of the GraphQL code in
github.py
alongside doing this. Or put that in a separate issue. That might take inspiration from the metrics codebase approach but note this is a CLI-tool so there may be some differences in desired behavior.The text was updated successfully, but these errors were encountered: