Expose ROW_BATCH_SIZE variable to google sheets connector #15038
Labels
area/connectors
Connector related issues
community
connectors/source/google-sheets
team/connectors-python
type/enhancement
New feature or request
Tell us about the problem you're trying to solve
Hi all, I am currently exploring airbyte for our ETL use cases, and while checking out the Google Sheets connector as a source I kept running into frequent rate limit problems. Keep in mind we have google sheets that can have upto 200,000 records in one sheet.
Describe the solution you’d like
After some digging into the codebase, I found that this chunk value (called as
ROW_BATCH_SIZE
is defined as a static value inside the connector. To get around the rate limit issue, I could use a bigger value ofROW_BATCH_SIZE
. So I was thinking, if we expose this variable to the connector config, that would get around my problem along with not breaking any of the existing flows.Describe the alternative you’ve considered or used
We have an internal service that currently calls the Google API and pulls the whole data of the sheet at once which does not cause any issues. However, to not have an overhead of maintaining this service and make it more self-serve, we are planning to shift to airbyte.
Additional context
I am using service account to authenticate with google API's
Are you willing to submit a PR?
Yes! Actually I have done (and tested) the changes as well, but I am not able to push the changes due to access issues, so would need help with that as well.
The text was updated successfully, but these errors were encountered: