Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase memory allocation for JDBC buffers in source DB connectors #20939

Merged
merged 23 commits into from
Jan 27, 2023

Conversation

akashkulk
Copy link
Contributor

@akashkulk akashkulk commented Dec 30, 2022

Work is part of #20417.

  • Follow-up to Increase Database Source SELECT Batch Size #19514 where allocation of source's total memory for JDBC buffers was bumped from 50% -> 60%
  • Increases MAX_FETCH_SIZE frpm 1,000,000 -> 2,000,000
  • Removes concept of MAX_BUFFER_BYTE_SIZE, which was hard coded to 1GB. The memory allocation now has an upper bound of TARGET_BUFFER_SIZE_RATIO * MAX_HEAP_SPACE. This is so we can take advantage of the increased memory for source containers.

Tested locally, and hasn't caused any additional OOMs

@akashkulk akashkulk temporarily deployed to more-secrets December 30, 2022 23:14 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets December 30, 2022 23:15 — with GitHub Actions Inactive
@akashkulk
Copy link
Contributor Author

akashkulk commented Dec 30, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3809559886
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3809559886
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       140      5    96%   87, 93, 238, 242-243
	 source_acceptance_test/conftest.py                     208     92    56%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-280, 288-301, 306-312, 319-330, 337-353
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       158     14    91%   52-59, 64-77, 240
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1603    336    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestConnection.test_check: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestDiscovery.test_discover: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestBasicRead.test_read: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestFullRefresh.test_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
================= 13 passed, 6 skipped, 21 warnings in 17.50s ==================

@akashkulk akashkulk temporarily deployed to more-secrets December 30, 2022 23:19 — with GitHub Actions Inactive
@akashkulk akashkulk marked this pull request as ready for review December 30, 2022 23:20
@akashkulk akashkulk temporarily deployed to more-secrets December 30, 2022 23:20 — with GitHub Actions Inactive
@akashkulk
Copy link
Contributor Author

@evantahler / @davinchia I was trying to follow the comments from the previous PR. What's a good way to test this increase?

@evantahler
Copy link
Contributor

evantahler commented Dec 31, 2022

@evantahler / @davinchia I was trying to follow the comments from the previous PR. What's a good way to test this increase?

... I didn't come up with a good way to test this kind of change 😢

I think for now, the best we can do is to publish the change (OSS) and run a few syncs with it on a local or staging K8s cluster and compare new vs old memory consumption for a sync. In the future, we'll be looking into blue/green connector deployments to try this connector out on a few cloud workspaces before rolling it out to everyone.

@davinchia
Copy link
Contributor

@akashkulk you should be able to build the image locally, load it into a local airbyte deployment (either docker or K8s) and inspect memory usage as the sync runs. This all can be done before we merge this change into OSS.

Alternatively, you can also build the connector locally and run the connector's read command and inspect local memory usage. I'm going to find some time over the next few days to draft our a local test harness for this. Till then, I would recommend either of these two options.

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving.... but you probably want to test it first

@akashkulk akashkulk temporarily deployed to more-secrets January 4, 2023 08:19 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 4, 2023 08:19 — with GitHub Actions Inactive
@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 4, 2023

@akashkulk you should be able to build the image locally, load it into a local airbyte deployment (either docker or K8s) and inspect memory usage as the sync runs. This all can be done before we merge this change into OSS.

Alternatively, you can also build the connector locally and run the connector's read command and inspect local memory usage. I'm going to find some time over the next few days to draft our a local test harness for this. Till then, I would recommend either of these two options.

@davinchia What do you recommend to inspect docker memory usage in a local airbyte deployment? I was going to basically run docker stats and inspect memory usage with the original buffer size vs new buffer size, but let me know if there is a better way

@akashkulk akashkulk temporarily deployed to more-secrets January 17, 2023 21:51 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 17, 2023 21:51 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Jan 17, 2023

Airbyte Code Coverage

File Coverage [100%] 🍏
TwoStageSizeEstimator.java 100% 🍏
Total Project Coverage 24%

@akashkulk akashkulk requested a review from a team as a code owner January 18, 2023 18:00
@akashkulk akashkulk temporarily deployed to more-secrets January 18, 2023 18:02 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Jan 18, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (5)

Connector Version Changelog Publish
source-alloydb 1.0.36
source-alloydb-strict-encrypt 1.0.36 🔵
(ignored)
🔵
(ignored)
source-mysql 1.0.21
source-mysql-strict-encrypt 1.0.21 🔵
(ignored)
🔵
(ignored)
source-postgres-strict-encrypt 1.0.41 🔵
(ignored)
🔵
(ignored)
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@akashkulk akashkulk temporarily deployed to more-secrets January 18, 2023 18:03 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 26, 2023 17:41 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 26, 2023 17:41 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 26, 2023 18:12 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 26, 2023 18:12 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 26, 2023 21:04 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 26, 2023 21:04 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 27, 2023 01:36 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 27, 2023 01:36 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 27, 2023 03:09 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 27, 2023 03:09 — with GitHub Actions Inactive
@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 27, 2023

/publish connector=connectors/source-postgres-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4021381605


Connector Did it publish? Were definitions generated?
connectors/source-postgres-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 27, 2023

/publish connector=connectors/source-postgres run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres
https://github.com/airbytehq/airbyte/actions/runs/4021382330


Connector Did it publish? Were definitions generated?
connectors/source-postgres

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 27, 2023

/publish connector=connectors/source-mysql run-tests=false

🕑 Publishing the following connectors:
connectors/source-mysql
https://github.com/airbytehq/airbyte/actions/runs/4021384699


Connector Did it publish? Were definitions generated?
connectors/source-mysql

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk
Copy link
Contributor Author

akashkulk commented Jan 27, 2023

/publish connector=connectors/source-mysql-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-mysql-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4021386496


Connector Did it publish? Were definitions generated?
connectors/source-mysql-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 27, 2023 04:00 — with GitHub Actions Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 27, 2023 04:00 — with GitHub Actions Inactive
@akashkulk akashkulk enabled auto-merge (squash) January 27, 2023 04:15
@akashkulk akashkulk temporarily deployed to more-secrets January 27, 2023 04:16 — with GitHub Actions Inactive
@akashkulk akashkulk temporarily deployed to more-secrets January 27, 2023 04:16 — with GitHub Actions Inactive
@akashkulk akashkulk merged commit 9ccdeb9 into master Jan 27, 2023
@akashkulk akashkulk deleted the larger-db-buffer branch January 27, 2023 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants