Skip to content

5.1.0

Compare
Choose a tag to compare
@github-actions github-actions released this 26 Oct 12:36
· 157 commits to master since this release

This release brings SSH tunnel connection recovery to Redshift Loader. Also, it makes disabling in-batch natural deduplication in Batch Transformer possible.

Option to disable in-batch natural deduplication in Batch Transformer

Previously, it wasn't possible to disable in-batch natural deduplication in Batch Transformer. We have found that in-batch natural deduplication affects performance therefore we have made disabling it possible. If duplicate events aren't a problem for you, we suggest disabling deduplication.

It can be disabled by adding following section to the config:

  "deduplication": {
    # When natural deduplication is disabled, 'synthetic' deduplication needs to be disabled too. 
    "synthetic": {
      "type": "NONE"
    }
    "natural": false
  }

More information about deduplication in Batch Transformer can be found here.

SSH tunnel connection recovery in Redshift Loader

Redshift loader can connect to a private Redshift cluster through an SSH tunnel. Previously, if SSH tunnel session was disconnected, the loader didn't have a way to discover it. We added retry around SSH tunnel connection to make it possible to recover from this problem and to make it more robust.

Upgrading to 5.1.0

If you are already using a recent version of RDB Loader (3.0.0 or higher) then upgrading to 5.1.0 is as simple as pulling the newest docker images. There are no changes needed to your configuration files.

docker pull snowplow/transformer-kinesis:5.1.0
docker pull snowplow/rdb-loader-redshift:5.1.0
docker pull snowplow/rdb-loader-snowflake:5.1.0
docker pull snowplow/rdb-loader-databricks:5.1.0

The Snowplow docs site has a full guide to running the RDB Loader.

Changelog

  • Transformer Batch: make in-batch natural deduplication optional (#1108)
  • Recover from disconnected SSH tunnel (#1084)