Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Destination BigQuery: Use byte-based buffering batching #8174

Closed
4 tasks done
alexandr-shegeda opened this issue Nov 22, 2021 · 0 comments · Fixed by #8199
Closed
4 tasks done

🎉 Destination BigQuery: Use byte-based buffering batching #8174

alexandr-shegeda opened this issue Nov 22, 2021 · 0 comments · Fixed by #8199

Comments

@alexandr-shegeda
Copy link
Contributor

alexandr-shegeda commented Nov 22, 2021

Tell us about the problem you're trying to solve

We need to implement byte-based buffering of records rather than record-number-based buffering for the BigQuery destination. This makes the connector much more resilient to pathological data like very wide rows. This fix should be applied to both INSERT and Bulk sync mode if it is not already applied.

Describe the solution you’d like

  • find a way to replace MAX_BATCH_SIZE with MAX_BATCH_SIZE_BYTES
  • refactor related functionality
  • cover the changes with proper tests
  • raise PR and pass to review

Describe the alternative you’ve considered or used

This PR fixed an important flaw in how JDBC destinations using the INSERT sync mode: it started using byte-based buffering of records instead of record-number-based buffering.

Additional context

Add any other context or screenshots about the feature request here.

It is super important that we consider DRYing our solution as much as possible because otherwise, we might need to implement the same thing repeatedly, which would be really bad.

@alexandr-shegeda alexandr-shegeda changed the title bigquery Destinations BigQuery: Use byte-based buffering batching Nov 22, 2021
@alexandr-shegeda alexandr-shegeda changed the title Destinations BigQuery: Use byte-based buffering batching 🎉 Destination BigQuery: Use byte-based buffering batching Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
3 participants