-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Make all JDBC destinations (SF, RS, PG, MySQL, MSSQL, Oracle) handle wide rows by using byte-based record buffering #7719
Conversation
/test connector=destination-snowflake
|
/test connector=destination-redshift
|
/test connector=destination-postgres
|
/test connector=destination-mysql
|
...ava/io/airbyte/integrations/destination/buffered_stream_consumer/BufferedStreamConsumer.java
Show resolved
Hide resolved
airbyte-integrations/connectors/destination-snowflake/build.gradle
Outdated
Show resolved
Hide resolved
…e-based-buffering-destinations
/test connector=destination-postgres
|
…e-based-buffering-destinations
…e-based-buffering-destinations
/publish connector=connectors/destination-mssql
|
/publish connector=connectors/destination-mssql-strict-encrypt
|
/publish connector=connectors/destination-mysql
|
…e wide rows by using byte-based record buffering (airbytehq#7719)
What
JDBC Destinations, when running in
INSERT
mode, typically write records as follows:This approach is problematic because it does not take into account how large those records are, which means wide rows can be problematic (10k 5mb rows has very different memory implications than 10k 4kb rows). This is especially exacerbated because when flushing records to the destination, the JDBC drivers needs to hold a SQL string containing all the rows in memory (i.e:
INSERT xyz INTO table
wherexyz
is the content of all the rows), so if the queue is about to write N bytes, we actually need to have a multiple ofN
bytes capacity in memory (somewhere between 2 and 3 as far as I can tell looking under the JDBC hood).Recently, a user was syncing very wide rows from a Postgres DB which caused this issue to appear. So this PR changes the buffering strategy to count bytes instead of number of records.
I've confirmed this fix works for the user in question.
How
There are two changes being made here:
INSERT
statements to at most 10k records because some destinations like Snowflake have limitations on how many records can be written at a time (snowflake limits to 16k).Recommended reading order
BufferedStreamConsumer.java
Pre-merge Checklist
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described here