Source Postgres : Fast query for estimate messages #21683

akashkulk · 2023-01-20T23:39:51Z

As part of #20783, trace estimate messages were added to source-postgres.

One of the follow-up items #21499 identified in that PR was to use a fast query to calculate number of rows instead of select count(*) which can be slow. This was initially thought to not be a problem : see #20783 (comment), however we are seeing user reports of this query being slow : #20783 (comment).

In this PR, we migrate to the fast query. Initially, this was held off because invalid results were returned for smaller tables. But thinking more on this, we can skip estimate traces altogether for small tables (the sync will probably complete quickly anyways). This can be determined if the fast table row count estimate is something invalid (e.g. a negative value)

The only open question here is whether we should disable estimate trace messages for non-CDC incremental mode as I can't think of another way to get around this other than to issue a select count(*) where cursor_id > cursor_value query, which would have the same issues as a select count(*)

github-actions · 2023-01-20T23:41:59Z

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

Run integration tests
Bump connector or module version
Add changelog
Publish the new version

✅ Sources (3)

Connector	Version	Changelog	Publish
`source-alloydb`	`1.0.35`	✅	✅
`source-alloydb-strict-encrypt`	`1.0.35`	🔵 (ignored)	🔵 (ignored)
`source-postgres-strict-encrypt`	`1.0.39`	🔵 (ignored)	🔵 (ignored)

See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector	Version	Changelog	Publish

See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category	Status	Actionable Item
Version	❌ mismatch	The version of the connector is different from its normal variant. Please bump the version of the connector.
	⚠ doc not found	The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like `source-jdbc` is not published or documented). Please double-check to make sure that it is not a bug.
Changelog	⚠ doc not found	The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like `source-jdbc` is not published or documented). Please double-check to make sure that it is not a bug.
	❌ changelog missing	There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish	⚠ not in seed	The connector is not in the seed file (e.g. `source_definitions.yaml`), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.
	❌ diff seed version	The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the `/publish` command to publish the latest version.

akashkulk · 2023-01-20T23:42:35Z

/test connector=connectors/source-postgres-strict-encrypt

🕑 connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/3972129585
✅ connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/3972129585
No Python unittests run

Build Passed

Test summary info:

All Passed

akashkulk · 2023-01-20T23:43:13Z

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3972132514
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3972132514
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:377: The previous and actual discovered catalogs are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_incremental.py:22: `future_state` has a bypass reason, skipping.
================== 54 passed, 6 skipped in 717.50s (0:11:57) ===================

github-actions · 2023-01-20T23:51:19Z

Airbyte Code Coverage

There is no coverage information present for the Files changed

Total Project Coverage	26.72%	🍏

evantahler

👍 from me, but a question below

evantahler · 2023-01-21T00:18:30Z

...ource-postgres/src/main/java/io/airbyte/integrations/source/postgres/PostgresQueryUtils.java

-            SELECT (SELECT COUNT(*) FROM %s) AS %s,
+            SELECT (select reltuples::int8 as count from pg_class c JOIN pg_catalog.pg_namespace n ON n.oid=c.relnamespace where nspname='%s' AND relname='%s') AS %s,


Maybe both versions should be kept and done in a try/catch... I wonder if there are certain older versions of postgres that don't have a pg_catalog table?

github-actions · 2023-01-21T00:38:57Z

Platform Test Results

  231 files ±0   231 suites ±0 26m 12s ⏱️ + 1m 4s
1 606 tests ±0 1 595 ✔️ ±0 11 💤 ±0 0 ❌ ±0
1 626 runs ±0 1 615 ✔️ ±0 11 💤 ±0 0 ❌ ±0

Results for commit 9973901. ± Comparison against base commit 6026465.

♻️ This comment has been updated with latest results.

akashkulk · 2023-01-23T00:21:09Z

/publish connector=connectors/source-postgres-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/3982262589

Connector	Did it publish?	Were definitions generated?
connectors/source-postgres-strict-encrypt	❌	❌

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

akashkulk · 2023-01-23T00:21:31Z

/publish connector=connectors/source-postgres run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres
https://github.com/airbytehq/airbyte/actions/runs/3982264139

Connector	Did it publish?	Were definitions generated?
connectors/source-postgres	✅	✅

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

akashkulk · 2023-01-23T00:35:59Z

/publish connector=connectors/source-postgres-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/3982325788

Connector	Did it publish?	Were definitions generated?
connectors/source-postgres-strict-encrypt	✅	❌

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

Source Postgres : Fast query for estimate messages

510fac7

octavia-squidington-iv added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/postgres labels Jan 20, 2023

Update documentation

b84e12a

Merge branch 'master' into fast_query

44b3a9f

akashkulk temporarily deployed to more-secrets January 20, 2023 23:44 — with GitHub Actions Inactive

akashkulk marked this pull request as ready for review January 21, 2023 00:07

akashkulk requested a review from a team as a code owner January 21, 2023 00:07

akashkulk requested review from evantahler and subodh1810 January 21, 2023 00:08

evantahler approved these changes Jan 21, 2023

View reviewed changes

akashkulk requested a review from evantahler January 21, 2023 00:43

evantahler approved these changes Jan 21, 2023

View reviewed changes

akashkulk mentioned this pull request Jan 21, 2023

Source Postgres : Emit estimate trace messages for non-CDC mode #20783

Merged

Merge branch 'master' into fast_query

9973901

akashkulk temporarily deployed to more-secrets January 21, 2023 00:56 — with GitHub Actions Inactive

Merge branch 'master' into fast_query

47846de

akashkulk temporarily deployed to more-secrets January 22, 2023 23:18 — with GitHub Actions Inactive

auto-bump connector version

cbd7c4f

octavia-squidington-iii temporarily deployed to more-secrets January 23, 2023 00:34 — with GitHub Actions Inactive

Update strict-encrypt Dockerfile

3223bac

octavia-squidington-iv added the connectors/source/postgres-strict-encrypt label Jan 23, 2023

akashkulk temporarily deployed to more-secrets January 23, 2023 00:37 — with GitHub Actions Inactive

akashkulk merged commit 79de89d into master Jan 23, 2023

akashkulk deleted the fast_query branch January 23, 2023 14:34

octavia-squidington-iii mentioned this pull request Jan 24, 2023

Bump Airbyte version from 0.40.28 to 0.40.29 #21767

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source Postgres : Fast query for estimate messages #21683

Source Postgres : Fast query for estimate messages #21683

akashkulk commented Jan 20, 2023

github-actions bot commented Jan 20, 2023 •

edited

Loading

✅ Sources (3)

✅ Destinations (0)

✅ Other Modules (0)

Actionable Items

akashkulk commented Jan 20, 2023 •

edited by github-actions bot

Loading

akashkulk commented Jan 20, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Jan 20, 2023 •

edited

Loading

evantahler left a comment

evantahler Jan 21, 2023

github-actions bot commented Jan 21, 2023 •

edited

Loading

akashkulk commented Jan 23, 2023 •

edited by github-actions bot

Loading

akashkulk commented Jan 23, 2023 •

edited by github-actions bot

Loading

akashkulk commented Jan 23, 2023 •

edited by github-actions bot

Loading

		SELECT (SELECT COUNT(*) FROM %s) AS %s,
		SELECT (select reltuples::int8 as count from pg_class c JOIN pg_catalog.pg_namespace n ON n.oid=c.relnamespace where nspname='%s' AND relname='%s') AS %s,

Source Postgres : Fast query for estimate messages #21683

Source Postgres : Fast query for estimate messages #21683

Conversation

akashkulk commented Jan 20, 2023

github-actions bot commented Jan 20, 2023 • edited Loading

Affected Connector Report

✅ Sources (3)

✅ Destinations (0)

✅ Other Modules (0)

Actionable Items

akashkulk commented Jan 20, 2023 • edited by github-actions bot Loading

Build Passed

akashkulk commented Jan 20, 2023 • edited by github-actions bot Loading

Build Passed

github-actions bot commented Jan 20, 2023 • edited Loading

Airbyte Code Coverage

evantahler left a comment

Choose a reason for hiding this comment

evantahler Jan 21, 2023

Choose a reason for hiding this comment

github-actions bot commented Jan 21, 2023 • edited Loading

Platform Test Results

akashkulk commented Jan 23, 2023 • edited by github-actions bot Loading

akashkulk commented Jan 23, 2023 • edited by github-actions bot Loading

akashkulk commented Jan 23, 2023 • edited by github-actions bot Loading

github-actions bot commented Jan 20, 2023 •

edited

Loading

akashkulk commented Jan 20, 2023 •

edited by github-actions bot

Loading

akashkulk commented Jan 20, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Jan 20, 2023 •

edited

Loading

github-actions bot commented Jan 21, 2023 •

edited

Loading

akashkulk commented Jan 23, 2023 •

edited by github-actions bot

Loading

akashkulk commented Jan 23, 2023 •

edited by github-actions bot

Loading

akashkulk commented Jan 23, 2023 •

edited by github-actions bot

Loading