Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Destination Snowflake: Add option to stage encrypted files via S3 #12452

Merged
merged 26 commits into from
May 4, 2022

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Apr 28, 2022

What

We want to be able to (a) write encrypted data to S3, and (b) COPY those encrypted files into Snowflake.

Also, fix destination-snowflake's purge_staging_data option (previously we were always deleting the output files)

How

  1. Create a BlobDecorator abstract class, which represents modifications to how a single S3 (GCS, Azure Storage, etc.) file is uploaded. Implement an AesCbc subclass, which encrypts the file and sets additional metadata.
  2. Modify BlobStorageOperations to accept 0 or more BlobDecorators. Make S3StorageOperations handle those decorators correctly. (GcsStorageOperations inherits that behavior)
  3. Create an EncryptionConfig struct to represent different encryption strategies (currently: NoEncryption and AesCbcEnvelope)
  4. Update destination-snowflake to accept an encryption entry inside its S3 Staging loading method, and pass that configuration down to SnowflakeS3StagingSqlOperations (which then injects the appropriate BlobDecorator into the S3StorageOperations, and adds the encryption key to the COPY command)

Some additional notes:

  • I couldn't find clear documentation on what the encryption format is supposed to look like; the implementation here was reverse-engineered from a Redshift UNLOAD command. I did run a local sync with encryption to Snowflake (see the blob on S3 and the AIRBYTE_DATABASE.AIRBYTE_SCHEMA._airbyte_raw_edgao_customers table in Snowflake)
  • New integration test config is added here. It uses ephemeral keys.
  • I didn't expose this feature in destination-s3, but it would be pretty easy. IMO it would be better to implement AES-GCM if we want to give users the option of encrypting their data though, since that's what the latest AWS SDK uses. (probably not difficult to implement)

Recommended reading order

  1. BlobDecorator, AesCbcEnvelopeEncryptionBlobDecorator, tests
  2. StreamTransferManager, StreamTransferManagerWithMetadata
  3. BlobStorageOperations, S3StorageOperations, GcsStorageOperations
  4. EncryptionConfig, NoEncryption, AesCbcEnvelopeEncryption
  5. spec.json
  6. StagingConsumerFactory, SnowflakeInternalStagingDestination, SnowflakeDestinationTest (this is just the purge_staging_data fix)
  7. SnowflakeS3StagingDestination, SnowflakeS3StagingSqlOperations, SnowflakeS3StagingSqlOperationsTest, SnowflakeS3CopyEncryptedDestinationAcceptanceTest
  8. snowflake.md

Deleting SnowflakeCopyS3Destination because it's no longer used anywhere.

🚨 User Impact 🚨

None. Existing configs will be use NoEncryption.

Pre-merge Checklist

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

@github-actions github-actions bot added the area/connectors Connector related issues label Apr 28, 2022
@edgao
Copy link
Contributor Author

edgao commented Apr 28, 2022

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2242313134
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2242313134
Python tests coverage:

Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_config/transform.py                                                                                       159     31    81%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/utils.py                                                                                           34      7    79%
normalization/transform_catalog/dbt_macro.py                                                                                       22      7    68%
normalization/transform_catalog/catalog_processor.py                                                                              147     80    46%
normalization/transform_catalog/transform.py                                                                                       61     38    38%
normalization/transform_catalog/stream_processor.py                                                                               534    345    35%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1447    550    62%

@edgao edgao force-pushed the edgao/snowflake_encrypted_staging branch from 2b47a2b to 9ae867b Compare April 28, 2022 23:38
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Apr 28, 2022
@edgao edgao marked this pull request as ready for review April 28, 2022 23:48
Copy link
Contributor

@tuliren tuliren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!

"title": "Key-encrypting key",
"description": "The type of key-encrypting key to use",
"type": "object",
"oneOf": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should test the spec in storybook to see if it works. Previously our UI does not support multiple-level oneOf specs very well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced with a simple text field. When it is not empty, it is a user provided key. When it is empty, it is an ephemeral key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nested oneOf work well now (tested in storybook, and also with a local airbyte instance). There's some weirdness with empty objects, but that seems like an actual bug which is also breaking e.g. the Internal Staging option - #12457

but will switch to the text field anyway, it would make the spec simpler to read + parse

dataStream.transferTo(outputStream);
succeeded = true;
} catch (final Exception e) {
LOGGER.error("Failed to load data into storage {}", objectPath, e);
throw new RuntimeException(e);
} finally {
outputStream.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if close fails, it won't execute the uploadManager cleanups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@edgao
Copy link
Contributor Author

edgao commented May 3, 2022

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2265056010
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2265056010
Python tests coverage:

Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_config/transform.py                                                                                       159     31    81%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/utils.py                                                                                           34      7    79%
normalization/transform_catalog/dbt_macro.py                                                                                       22      7    68%
normalization/transform_catalog/catalog_processor.py                                                                              147     80    46%
normalization/transform_catalog/transform.py                                                                                       61     38    38%
normalization/transform_catalog/stream_processor.py                                                                               534    345    35%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1447    550    62%

@edgao
Copy link
Contributor Author

edgao commented May 3, 2022

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2265971371
❌ Failed to publish connectors/destination-snowflake
❌ Couldn't auto-bump version for connectors/destination-snowflake

@edgao
Copy link
Contributor Author

edgao commented May 3, 2022

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2266439299
❌ Failed to publish connectors/destination-snowflake
❌ Couldn't auto-bump version for connectors/destination-snowflake

@edgao
Copy link
Contributor Author

edgao commented May 4, 2022

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2267582754
🚀 Successfully published connectors/destination-snowflake
🚀 Auto-bumped version for connectors/destination-snowflake
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2267582754

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets May 4, 2022 04:25 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets May 4, 2022 04:25 Inactive
@edgao edgao merged commit 43470a2 into master May 4, 2022
@edgao edgao deleted the edgao/snowflake_encrypted_staging branch May 4, 2022 14:31
suhomud pushed a commit that referenced this pull request May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants