Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift Destination: Add SSH Tunnelling Config Option #23523

Merged
merged 20 commits into from
Mar 7, 2023

Conversation

jcowanpdx
Copy link
Contributor

@jcowanpdx jcowanpdx commented Feb 27, 2023

Issue: #12131

Acceptance tests pass and exercise both Insert and S3 type loading options as well as password and key based authentication with the SSH bastion host.

Airbyters: Credentials/configs for acceptance testing against AWS Redshift are updated in Google Cloud Secret Manager. Be sure to get both the config.json and config_staging.json as well as the new PEM file

What

User would like to be able to access their Redshift Destination database via an SSH Tunnelling config (SSH Bastion Host), similar to Postgres, MySQL, etc. destinations.

How

This change wraps the Destination constructor in the SSHWrappedDestination, which injects SSH configuration options into the connector configuration spec.

Recommended reading order

  1. Check the main classes first.
  2. Read the base integration test class next, followed by the specific implementation classes.

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

We are not expecting any breaking changes as this is an enabling PR, adding new features and capabilities only.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Issue: #12131

User would like to be able to access their Redshift Destination database via an SSH Tunnelling config (SSH Bastion Host), just as in Postgres, etc. destinations.

Acceptance tests pass and exercise both Insert and S3 type loading options as well as password and key based authentication with the SSH bastion host.

Airbyters: Credentials/configs for acceptance testing against AWS Redshift are updated in Google Cloud Secret Manager. Be sure to get both the config.json and config_staging.json as well as the new PEM file
@jcowanpdx jcowanpdx requested a review from a team as a code owner February 27, 2023 23:31
@CLAassistant
Copy link

CLAassistant commented Feb 27, 2023

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (1)

Connector Version Changelog Publish
destination-redshift 0.4.0
  • See "Actionable Items" below for how to resolve warnings and errors.

👀 Other Modules (1)

  • base-normalization

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@jcowanpdx jcowanpdx linked an issue Feb 28, 2023 that may be closed by this pull request
jcowanpdx and others added 4 commits February 27, 2023 17:20
Issue: #12131

User would like to be able to access their Redshift Destination database via an SSH Tunnelling config (SSH Bastion Host), just as in Postgres, etc. destinations.

Acceptance tests pass and exercise both Insert and S3 type loading options as well as password and key based authentication with the SSH bastion host.

Airbyters: Credentials/configs for acceptance testing against AWS Redshift are updated in Google Cloud Secret Manager. Be sure to get both the config.json and config_staging.json as well as the new PEM file
…com:airbytehq/airbyte into jeff/add-ssh-tunnel-to-redshift-destination
@jcowanpdx
Copy link
Contributor Author

jcowanpdx commented Feb 28, 2023

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4295284609
❌ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4295284609
🐛 https://gradle.com/s/padauifeqyrps

Build Failed

Test summary info:

Could not find result summary

Update the version id.
Touch up the docs.
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Feb 28, 2023
@jcowanpdx
Copy link
Contributor Author

jcowanpdx commented Feb 28, 2023

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4297402723
❌ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4297402723
🐛 https://gradle.com/s/bpcl3cqxudftu

Build Failed

Test summary info:

Could not find result summary

@evantahler
Copy link
Contributor

evantahler commented Feb 28, 2023

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4297646524
✅ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4297646524
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 15      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    18      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     171     10    94%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         195     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         65     39    40%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1461    632    57%

Build Passed

Test summary info:

All Passed

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditionally approved assuming /test passes

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small question around the top-level destination class, otherwise lgtm

@jcowanpdx
Copy link
Contributor Author

Confirmed that normalization will work through the tunnel configuration as well.

@jcowanpdx
Copy link
Contributor Author

jcowanpdx commented Mar 3, 2023

/publish connector=connectors/destination-redshift

🕑 Publishing the following connectors:
connectors/destination-redshift
https://github.com/airbytehq/airbyte/actions/runs/4327523925


Connector Did it publish? Were definitions generated?
connectors/destination-redshift

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@grishick
Copy link
Contributor

grishick commented Mar 4, 2023

Looks like some integration tests are failing to connect to Redshift while executing publish command. This may be related to either our internal Redshift cluster being overwhelmed or the SSH bastion. At least one of the errors was this (looks more like the problem is with the SSH bastion):

        Caused by:
        java.sql.SQLTransientConnectionException: HikariPool-206 - Connection is not available, request timed out after 60005ms.
            at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696)
            at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:181)
            at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
            at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
            at org.jooq.impl.DataSourceConnectionProvider.acquire(DataSourceConnectionProvider.java:83)
            ... 96 more

            Caused by:
            java.sql.SQLException: [Amazon](500310) Invalid operation: connection limit "500" exceeded for non-bootstrap users;
                at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
                at com.amazon.redshift.client.InboundDataHandler.read(Unknown Source)
                at com.amazon.support.channels.AbstractSocketChannel.readCallback(Unknown Source)
                at com.amazon.support.channels.TLSSocketChannel.read(Unknown Source)

                Caused by:
                com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: connection limit "500" exceeded for non-bootstrap users;
                    at app//com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
                    ... 3 more

@jcowanpdx
Copy link
Contributor Author

Looks like some integration tests are failing to connect to Redshift while executing publish command. This may be related to either our internal Redshift cluster being overwhelmed or the SSH bastion. At least one of the errors was this (looks more like the problem is with the SSH bastion):

        Caused by:
        java.sql.SQLTransientConnectionException: HikariPool-206 - Connection is not available, request timed out after 60005ms.
            at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696)
            at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:181)
            at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
            at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
            at org.jooq.impl.DataSourceConnectionProvider.acquire(DataSourceConnectionProvider.java:83)
            ... 96 more

            Caused by:
            java.sql.SQLException: [Amazon](500310) Invalid operation: connection limit "500" exceeded for non-bootstrap users;
                at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
                at com.amazon.redshift.client.InboundDataHandler.read(Unknown Source)
                at com.amazon.support.channels.AbstractSocketChannel.readCallback(Unknown Source)
                at com.amazon.support.channels.TLSSocketChannel.read(Unknown Source)

                Caused by:
                com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: connection limit "500" exceeded for non-bootstrap users;
                    at app//com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
                    ... 3 more

seems more related to Redshift limits per some docs I'm reading. Raising the limit might not be a great solution as that requires some back and forth with Amazon. I suspect something is either leaking connections or we are running these tests in parallel at too high a rate. Not sure why integration tests are using so many connections per se.

@jcowanpdx
Copy link
Contributor Author

jcowanpdx commented Mar 6, 2023

/publish connector=connectors/destination-redshift

🕑 Publishing the following connectors:
connectors/destination-redshift
https://github.com/airbytehq/airbyte/actions/runs/4348047576


Connector Did it publish? Were definitions generated?
connectors/destination-redshift

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@jcowanpdx
Copy link
Contributor Author

The change to running integration tests in parallel, combined with a 4 hour idle session timeout in redshift causes an accumulation of idle sessions that cause resource shortages running tests. I have lowered the idle session time to 60 in Redshift and local testing look promising. Running another /publish run to verify

@jcowanpdx
Copy link
Contributor Author

jcowanpdx commented Mar 7, 2023

/publish connector=connectors/destination-redshift

🕑 Publishing the following connectors:
connectors/destination-redshift
https://github.com/airbytehq/airbyte/actions/runs/4349288708


Connector Did it publish? Were definitions generated?
connectors/destination-redshift

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@jcowanpdx
Copy link
Contributor Author

jcowanpdx commented Mar 7, 2023

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4356472449
✅ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4356472449
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 15      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    18      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     171     10    94%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         195     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         65     39    40%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1461    632    57%

Build Passed

Test summary info:

All Passed

@jcowanpdx jcowanpdx merged commit b6b4203 into master Mar 7, 2023
@jcowanpdx jcowanpdx deleted the jeff/add-ssh-tunnel-to-redshift-destination branch March 7, 2023 17:29
danielduckworth pushed a commit to danielduckworth/airbyte that referenced this pull request Mar 13, 2023
* Redshift Destination: Add SSH Tunnelling Config Option
Issue: airbytehq#12131

User would like to be able to access their Redshift Destination database via an SSH Tunnelling config (SSH Bastion Host), just as in Postgres, etc. destinations.

Acceptance tests pass and exercise both Insert and S3 type loading options as well as password and key based authentication with the SSH bastion host.

Airbyters: Credentials/configs for acceptance testing against AWS Redshift are updated in Google Cloud Secret Manager. Be sure to get both the config.json and config_staging.json as well as the new PEM file

* README.md updates
** Move the pem info into the secrets config files
** Update the version id to 0.4.0
** Touch up the docs
** adding a note about S3 staging NOT using the SSH Tunnel configuration, if provided.
** make base class abstract to prevent double running the same tests

Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/redshift
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redshift Destination implement SSH tunneling
7 participants