Refactor BigQuery Destination Integration tests #20851

grishick · 2022-12-23T01:35:39Z

This refactoring reduces code duplication and moves configuration logic from code into config files.
The purpose of this change is to make it easier to add more test cases for configuration variations such as
data location, accounts with various permission combinations, and account impersonation. This change was inspired by the impersonation PR. In order to run all the tests with impersonated credentials, we would have had to write a bunch more code. Instead, after this refactoring, we can just add two more credential files (1 - positive test, 2 - negative test) and about ~10 lines of code.

Part of the change is removing BigQueryGcsDestinationTest, by replacing it with a gcs-enabled config file (I added new config files to GSM before submitting this PR).

github-actions · 2022-12-23T01:37:53Z

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

Run integration tests
Bump connector or module version
Add changelog
Publish the new version

✅ Sources (0)

Connector	Version	Changelog	Publish

See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (2)

Connector	Version	Changelog	Publish
`destination-bigquery`	`1.2.9`	✅	✅
`destination-bigquery-denormalized`	`1.2.9`	✅	✅

See "Actionable Items" below for how to resolve warnings and errors.

👀 Other Modules (1)

base-normalization

Actionable Items

(click to expand)

Category	Status	Actionable Item
Version	❌ mismatch	The version of the connector is different from its normal variant. Please bump the version of the connector.
	⚠ doc not found	The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like `source-jdbc` is not published or documented). Please double-check to make sure that it is not a bug.
Changelog	⚠ doc not found	The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like `source-jdbc` is not published or documented). Please double-check to make sure that it is not a bug.
	❌ changelog missing	There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish	⚠ not in seed	The connector is not in the seed file (e.g. `source_definitions.yaml`), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.
	❌ diff seed version	The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the `/publish` command to publish the latest version.

edgao

checking my understanding:

BigQueryDAT - runs the full DAT suite against a standard inserts config
BigQueryGcsDAT - runs the full DAT suite against a GCS staging config. Extends BigQueryDAT for convenience
- Maybe we should have a root-level BigQueryDAT and then inherit BigQueryStandardInsertsDAT + BigQueryGcsDAT from it? It might reduce some code duplication between them in the setup method
BigQueryDestinationTest - verifies a couple different expected success/failure configs with basic behaviors (check, write) but doesn't run the full DAT process against all of them

?

left a couple small comments+questions but overall this makes sense.

edgao · 2022-12-23T17:25:53Z

...egration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTestUtils.java

+    final String tmpConfigAsString = Files.readString(configFile);
+    final JsonNode tmpConfigJson = Jsons.deserialize(tmpConfigAsString);
+    final JsonNode tmpCredentialsJson = tmpConfigJson.get(BigQueryConsts.BIGQUERY_BASIC_CONFIG);
+    Builder<Object, Object> mapBuilder = ImmutableMap.builder();


nitpick: ObjectNode finalConfig = (ObjectNode) Jsons.emptyObject() and build it directly

edgao · 2022-12-23T17:29:47Z

...egration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTestUtils.java

+
+public class BigQueryDestinationTestUtils {
+
+  public static JsonNode createConfig(Path configFile, String datasetId) throws IOException {


not sure I understand - it looks like this is reading a json blob and shuffling some stuff around. Can we just store the final json blob in GSM?

good point. We should make the secrets stored in GSM have the structure that the tests expect and avoid all this extra shuffling

edgao · 2022-12-23T17:38:34Z

...egration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTestUtils.java

+  /**
+   * Remove all the GCS output from the tests.
+   */
+  public static boolean tearDownGcs(AmazonS3 s3Client, JsonNode config, Logger LOGGER) {


nitpick (nonblocking): would be cool to have this live in base-java-s3, but idk if our build.gradle are set up to support this easily

edgao · 2022-12-23T17:40:44Z

...egration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTestUtils.java

+
+    final JsonNode properties = config.get(BigQueryConsts.LOADING_METHOD);
+    final String gcsBucketName = properties.get(BigQueryConsts.GCS_BUCKET_NAME).asText();
+    final String gcs_bucket_path = properties.get(BigQueryConsts.GCS_BUCKET_PATH).asText();


this means that if we ever want to test the gcs_bucket_path = "" scenario, we'll want to set up a new bucket for it - which is probably correct, just want to call it out

is gcs_bucket_path = "" a valid config for staging?

I think so? Not aware of any reason why not

edgao · 2022-12-23T17:41:41Z

...ion/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationAcceptanceTest.java

+    try {
+      dataset = BigQueryDestinationTestUtils.initDataSet(config, bigquery, datasetId);
+    } catch(Exception ex) {
+      //ignore


shouldn't we fail if we can't create the dataset? Or is this to handle expected errors like permissions problems?

good point. I got lost in my own refactoring and left this try-catch behind

edgao · 2022-12-23T17:42:28Z

...ion/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationAcceptanceTest.java

-                  }
-                }));
+  protected void addShutdownHook() {
+    Runtime.getRuntime().addShutdownHook(new Thread(() -> {


I'd mildly prefer to have this in an @AfterEach method, but this is fine if that would be complicated (also looks like you inherited this from the existing test structure, so feel free to ignore)

Hm.. I thought it was already called after each test, because it overwrites tearDown. I'll add @AfterEach annotation. The shutdown hooks were there before refactoring and AFAIK, the reason is to execute cleanup even if the test crashes.

I'm pretty sure junit takes care of running AfterEach methods, even if tests blow up (except maybe in cases where the entire runtime crashes?)

but yeah, don't worry about this if it looks super messy. Sounds like this has been working correctly as a shutdown hook anyway

edgao · 2022-12-23T18:19:11Z

...t-integration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTest.java

+      throw new IllegalStateException("""
+                                      Json config not found. Must provide path to a big query credentials file,
+                                       please add file with creds to
+                                      ../destination-bigquery/secrets/credentials-with-missed-dataset-creation-role.json.""");


Suggested change

../destination-bigquery/secrets/credentials-with-missed-dataset-creation-role.json.""");

<...>/destination-bigquery/secrets/credentials-with-missed-dataset-creation-role.json.""");

since .. might get misinterpreted

grishick · 2022-12-23T19:31:22Z

Looks like we broke DATs for destination-bigquery-denormalized back in November. I opened a separate PR to fix it: #20871

grishick · 2022-12-27T22:44:08Z

/test connector=connectors/destination-bigquery

🕑 connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/3790371368
✅ connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/3790371368
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1441    629    56%

Build Passed

Test summary info:

All Passed

grishick · 2022-12-28T02:38:17Z

/test connector=connectors/destination-bigquery-denormalized

🕑 connectors/destination-bigquery-denormalized https://github.com/airbytehq/airbyte/actions/runs/3791285212
✅ connectors/destination-bigquery-denormalized https://github.com/airbytehq/airbyte/actions/runs/3791285212
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1441    629    56%

Build Passed

Test summary info:

All Passed

etsybaev · 2022-12-28T17:54:44Z

...ation-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryUtils.java

@@ -70,7 +70,7 @@ public class BigQueryUtils {
      DateTimeFormatter.ofPattern("[yyyy][yy]['-']['/']['.'][' '][MMM][MM][M]['-']['/']['.'][' '][dd][d]" +
          "[[' ']['T']HH:mm[':'ss[.][SSSSSS][SSSSS][SSSS][SSS][' '][z][zzz][Z][O][x][XXX][XX][X]]]");
  private static final String USER_AGENT_FORMAT = "%s (GPN: Airbyte)";
-  private static final String CHECK_TEST_DATASET_SUFFIX = "_airbyte_check_stage_tmp_" + System.currentTimeMillis();


Just curious, if this timestamp removing may cause issues during simultaneous tests execution? Ex. same tests for different PRs, local run, etc?

This shouldn't affect anything since it's implemented back below in checkHashCreateAndDeleteDatasetRole and CHECK_TEST_DATASET_SUFFIX is only called within that method

The reason I moved the timestamp from class instantiation time to test execution time is that it was causing name collision when tests run too fast.

ryankfu · 2022-12-28T18:35:42Z

.../java/io/airbyte/integrations/destination/bigquery/BigQueryGcsDestinationAcceptanceTest.java

  }

+  protected void tarDownGcs() {


this probably should be tearDownGcs

to reduce code duplication and move configuration logic from code into config files. This refactoring will make it easier to add more test cases for configuration variations such as data location, accounts with various permission combinations, and account impersonation

ryankfu

Overall, looks good and the usage of config files cleans up a lot of the overhead from before too. Thanks for taking on this change

ryankfu · 2022-12-28T18:43:38Z

...egration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTestUtils.java

+    if(s3Client == null) {
+      return;
+    }
+    if(BigQueryUtils.getLoadingMethod(config) != UploadingMethod.GCS) {


This if check can probably be grouped together with the preceding if statement. Also just to make sure this is clear, is the reason for these empty return basically to say we shouldn't be logging any warnings? From what I can understand, it seems that if these values don't exist then it will log a warning in the try-catch block

ryankfu · 2022-12-28T18:52:35Z

...t-integration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTest.java

+      throw new IllegalStateException("""
+                                      Json config not found. Must provide path to a big query credentials file,
+                                       please add file with creds to
+                                      <...>/destination-bigquery/secrets/credentials-with-missed-dataset-creation-role.json.""");


Might be a good place to use the constants in a string format instead of having the values be re-type. Something like

throw new IllegalStateException(String.format(""" Json config not found. Must provide path to a big query credentials file, please add file with creds to <...>/destination-bigquery/%s""", CREDENTIALS_WITH_MISSED_CREATE_DATASET_ROLE_PATH);

That said, it appears this hardcoded strings already exist in tests so it's a minor nit and it's not functionally changing anything. The reason for the nit is that it explicitly gives the reader an understanding of where the file name came from

Good catch. I should actually remove all of this copy pasta and just iterate over the array of paths.

grishick · 2022-12-28T19:10:58Z

/test connector=connectors/destination-bigquery

🕑 connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/3796081975

grishick · 2022-12-28T19:11:16Z

/test connector=connectors/destination-bigquery-denormalized

🕑 connectors/destination-bigquery-denormalized https://github.com/airbytehq/airbyte/actions/runs/3796083138
✅ connectors/destination-bigquery-denormalized https://github.com/airbytehq/airbyte/actions/runs/3796083138
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1441    629    56%

Build Passed

Test summary info:

All Passed

grishick · 2022-12-28T20:03:13Z

/test connector=connectors/destination-bigquery

🕑 connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/3796307446
✅ connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/3796307446
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1441    629    56%

Build Passed

Test summary info:

All Passed

grishick · 2022-12-28T21:36:40Z

/approve-and-merge reason=”to unblock other PRs that modify BigQuery tests"

octavia-approvington · 2022-12-28T21:37:31Z

This looks fine!
Merged!

jbfbell · 2022-12-30T21:34:36Z

...t-integration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTest.java

+  public static void beforeAll() throws IOException {
+    for(Path path : ALL_PATHS) {
+      if (!Files.exists(path)) {
+        throw new IllegalStateException(


Small thing - I believe this exception will be thrown on the first file to not exist, then exit the loop. It might be more convenient if all missing files are shown in the error message.

final List<Path> missingPaths = ALL_PATHS.stream().filter(path -> !Files.exist(path)).collect(Collectors.toList()); if (!missingPaths.isEmpty()) { throw new IllegalStateException( // ... ) }

jbfbell · 2022-12-30T21:46:38Z

...t-integration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTest.java

-
-    tornDown = false;
-    addShutdownHook();
+    configs  = new HashMap<String, JsonNode>() {{


Just curious if there was motivation to remove the ImmutableMap.builder() flow here?

jbfbell · 2022-12-30T21:49:07Z

...t-integration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTest.java

+    try {
+      dataset = BigQueryDestinationTestUtils.initDataSet(config, bigquery, datasetId);
+    } catch(Exception ex) {
+      //ignore


I think we should at least log a warning here, otherwise if an exception occurs at this point it would be difficult to debug.

jbfbell · 2022-12-30T22:09:02Z

...t-integration/java/io/airbyte/integrations/destination/bigquery/BigQueryDestinationTest.java

@@ -374,11 +322,17 @@ void testWriteFailure(final DatasetIdResetter resetDatasetId) throws Exception {
  }

  private Set<String> fetchNamesOfTablesInDb() throws InterruptedException {
+    if(dataset == null || bigquery == null) {


I think the two conditionals on lines 325-327 and 333-335 can be combined. The QueryJobConfiguration AFAIK is just building a query rather than executing it. Still a little rusty on my Java but I think an Optional would be good here

Optional<Dataset> potentialDataset = Optional.ofNullable(dataset); if (potentialDataset.filter(d -> !d.exists()).orElse(false) || bigquery == null) { return Collections.emptySet(); }

jbfbell

Left a few comments but mostly LGTM

octavia-squidington-iv added area/connectors Connector related issues connectors/destination/bigquery labels Dec 23, 2022

This comment was marked as outdated.

Sign in to view

grishick requested review from edgao and jbfbell December 23, 2022 02:55

grishick mentioned this pull request Dec 23, 2022

Destination-bigquery: add ability to impersonate another GCP account #20788

Closed

edgao approved these changes Dec 23, 2022

View reviewed changes

This comment was marked as resolved.

Sign in to view

grishick force-pushed the greg/refactor-bq-tests branch from a3f7ed2 to e38d902 Compare December 27, 2022 18:40

This comment was marked as resolved.

Sign in to view

grishick requested review from suhomud and etsybaev December 27, 2022 23:09

grishick force-pushed the greg/refactor-bq-tests branch from e38d902 to 6b7d29a Compare December 28, 2022 02:38

grishick requested a review from ryankfu December 28, 2022 02:42

etsybaev approved these changes Dec 28, 2022

View reviewed changes

ryankfu reviewed Dec 28, 2022

View reviewed changes

grishick added 3 commits December 28, 2022 10:38

More refactoring

7069373

fix typo

e5027e4

grishick force-pushed the greg/refactor-bq-tests branch from 6b7d29a to e5027e4 Compare December 28, 2022 18:39

Change secret file names to avoid conflict with current tests on master

7cd8c94

ryankfu approved these changes Dec 28, 2022

View reviewed changes

grishick added 2 commits December 28, 2022 11:00

remove copy-pasted credential file paths

34e0842

more copu-pasta reduction

342e2e5

etsybaev mentioned this pull request Dec 28, 2022

Destination-bigquery: to handle new datatypes #20898

Closed

37 tasks

octavia-approvington approved these changes Dec 28, 2022

View reviewed changes

octavia-approvington merged commit 99335da into master Dec 28, 2022

octavia-approvington deleted the greg/refactor-bq-tests branch December 28, 2022 21:37

github-actions bot mentioned this pull request Dec 29, 2022

Bump helm chart version reference to 0.43.11 #20918

Merged

jbfbell reviewed Dec 30, 2022

View reviewed changes

This was referenced Jan 6, 2023

Bump Airbyte version from 0.40.26 to 0.40.27 #21092

Closed

Bump Airbyte version from 0.40.26 to 0.40.27 #21121

Closed

This was referenced Jan 6, 2023

Bump Airbyte version from 0.40.26 to 0.40.27 #21130

Closed

Bump Airbyte version from 0.40.26 to 0.40.27 #21135

Merged

sh4sh mentioned this pull request Mar 7, 2023

BigQuery Destination: Proposal to impersonate account on bigquery #15820

Closed

37 tasks


		public class BigQueryDestinationTestUtils {

		public static JsonNode createConfig(Path configFile, String datasetId) throws IOException {

	../destination-bigquery/secrets/credentials-with-missed-dataset-creation-role.json.""");
	<...>/destination-bigquery/secrets/credentials-with-missed-dataset-creation-role.json.""");

Refactor BigQuery Destination Integration tests #20851

Refactor BigQuery Destination Integration tests #20851

Conversation

grishick commented Dec 23, 2022 • edited Loading

github-actions bot commented Dec 23, 2022 • edited Loading

Affected Connector Report

✅ Sources (0)

✅ Destinations (2)

👀 Other Modules (1)

Actionable Items

This comment was marked as outdated.

This comment was marked as outdated.

edgao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grishick Dec 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grishick commented Dec 23, 2022

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

grishick commented Dec 27, 2022 • edited by github-actions bot Loading

Build Passed

grishick commented Dec 28, 2022 • edited by github-actions bot Loading

Build Passed

etsybaev Dec 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryankfu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryankfu Dec 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grishick commented Dec 28, 2022 • edited by github-actions bot Loading

grishick commented Dec 28, 2022 • edited by github-actions bot Loading

Build Passed

grishick commented Dec 28, 2022 • edited by github-actions bot Loading

Build Passed

grishick commented Dec 28, 2022

octavia-approvington commented Dec 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbfbell left a comment

Choose a reason for hiding this comment

grishick commented Dec 23, 2022 •

edited

Loading

github-actions bot commented Dec 23, 2022 •

edited

Loading

grishick Dec 23, 2022 •

edited

Loading

grishick commented Dec 27, 2022 •

edited by github-actions bot

Loading

grishick commented Dec 28, 2022 •

edited by github-actions bot

Loading

etsybaev Dec 28, 2022 •

edited

Loading

ryankfu Dec 28, 2022 •

edited

Loading

grishick commented Dec 28, 2022 •

edited by github-actions bot

Loading

grishick commented Dec 28, 2022 •

edited by github-actions bot

Loading

grishick commented Dec 28, 2022 •

edited by github-actions bot

Loading