Skip to content

Releases: snowplow/snowplow-rdb-loader

6.0.2

10 Jul 11:43
Compare
Choose a tag to compare

What's Changed

  • Improve RDB Loader behavior for table without comment
  • Bump Iglu Scala Client to 3.1.1
  • Loader should not call the "list" Iglu endpoint for Snowflake/Databricks tables
  • RDB Redshift Loader: increase atomic field lengths when RDB Loader creates Redshift events table
  • RDB Databricks Loader: remove atomic field lengths when RDB Loader creates Databricks table

6.0.0

19 Mar 09:28
Compare
Choose a tag to compare

[Redshift-only] New migration mechanism & recovery tables

Previously, Redshift loaders would migrate the shredded table to the latest available schema version. This could lead to a race condition between transformer & loader.

As of 6.0.0, loader will migrate the shredded table to the latest schema version discovered in the shredding_complete payload (rather than the latest existing version). Also, thanks to the new file hierarchy described below, the loader is able to issue one COPY statement per schema version. This enables the loader to decide on the exact set of columns.

Also, we are introducing a new mechanism to prevent the loader from failing when the schema is not evolved correct. You can find more information about it in here.

[Redshift-only] Monitoring recovery tables

Previous versions have been printing the table name to stdout. As of 6.0.0, in case an event is loaded to a recovery table, the name of that recovery table will be printed instead.

In case webhook is configured, previous recent versions would use load_succeeded/3-0-0 to report information about the successful load.

As of 6.0.0, loader will use load_succeeded/3-0-1 schema which comes with $.recoveryTableNames key to report the list of names of recovery tables loaded in the batch (breaking schema keys from shredding_complete payload).

[Redshift-only] $.featureFlags.disableMigration configuration

RDB Loader 6.0.0 introduces a new configuration, $.featureFlags.disableMigration, a list of schema criterion to disable migration for.

For the provided schema criterions only, RDB Loader will neither migrate the corresponding shredded table nor create recovery tables for breaking schema versions. Loader will attempt to load to the corresponding shredded table without migrating.

This is useful if you have older schemas with breaking changes and don’t want the loader to apply the new logic to them.

New file hierarchy for shredded events

Both batch & stream transformers would write shredded events based on the following scheme so far

vendor/name/model

As of 6.0.0, all transformers will use the following scheme

vendor/name/model/revision/addition

which increases granularity of the output, enabling higher precision in downstream usage.

Removal of padding \N char

Transformers write events to S3 to be loaded by Redshift. For the loading command to work, all events at a given path (e.g. com.acme/button_click/1) must follow the same format. A batch, however, may contain events with different versions of a given schema. In particular, events with a newer schema might have new fields not present in the events with an older one.

Previously, transformers solved this problem by formatting all events according to the latest version of the schema and using the \N character in case of missing fields.

As of 6.0.0, there is no need to do that, because — as explained above — events using different versions of a schema are written to different paths.

New license

Following our recent licensing announcement, RDB Loader is now released under the Snowplow Limited Use License Agreement.

Changelog

  • Bump AWS SDK to 1.12.677 (#1344)
  • Bump commons-compress to 1.26.0 (#1344)
  • Bump nimbus-jose-jwt to 9.37.2 (#1344)
  • Add mandatory SLULA license acceptance flag (#1344)
  • Bump schema-ddl to 0.22.1 (#1342)
  • Bump AWS SDK to 2.23.17 (#1339)
  • pubsub transformer: increase subscriber's awaitTermiantePeriod (#1328)
  • pubsub transformer: Increase default value of minDurationPerAckExtension (#1326)
  • Loader: Fix column names for shredded tables (#1332)
  • Redshift loader: send statsd metrics for recovery tables (#1331)
  • Quote column names in Redshift load statements (#1330)
  • Loader: Report recovery table names in load_succeeded payload (#1318)
  • Loader: Fix table name in COPY logs (#1316)
  • Upgrade schema-ddl to 0.20.0 (#1265)
  • Move to Snowplow Limited Use License (#1345)

5.7.5

14 Mar 11:33
Compare
Choose a tag to compare

This is a patch release that bumps dependencies for potential security vulnerabilities.

Changelog

  • Bump zookeeper to 3.7.2 (#1325)
  • Bump aws sdk to 2.21.33 (#1325)
  • Bump jetty-http to 9.4.53.v20231009 (#1325)
  • Bump reactor-netty-http to 1.0.39 (#1325)
  • Use databricks JDBC 2.6.34 (#1325)

5.7.4

10 Oct 22:40
Compare
Choose a tag to compare

This is a patch release that bumps dependencies for potential security vulnerabilities.

Changelog

  • Bump snappy-java to 1.1.10.4 (#1313)

5.7.3

07 Sep 13:31
Compare
Choose a tag to compare

This is a patch release that bumps dependencies for potential security vulnerabilities.

Changelog

  • Bump jackson-mapper-asl to 1.9.14-atlassian-6 (#1312)
  • Loader: exclude unnecessary hadoop dependencies (#1312)
  • Bump snappy-java to 1.1.10.3 (#1312)
  • Bump jettison to 1.5.4 (#1312)

5.7.1

08 Aug 13:47
Compare
Choose a tag to compare

A patch release to remove unwanted transitive dependencies, improve tests, and fix minor bugs.

Changelog

  • Lower sensitivity of cats-effect responsiveness warning (#1309)
  • Reduce log level for test suite (#1307)
  • Exclude zookeeper transitive dependency from loaders (#1305)
  • Batch Transformer: make it possible to skip schemas with all transformations (#1300)
  • Bump Snowplow Events Manifest to 0.4.0 (#1303)
  • transformer-kafka: add semi-automatic test scenarios using cloud resources (#1302)

5.7.0

04 Aug 16:28
Compare
Choose a tag to compare

Add Azure support

In this commit, we introduce necessary changes and assets to make it possible to run RDB Loader with Azure services. These are the changes:

  • Introduce new transformer-kafka asset that will be able to read events from Kafka topic and writes transformed events to Azure Blob Storage
  • Make necessary changes on the Loader module to read shredding complete messages from Kafka module. Also, loader needs to interact with blob storage for folder monitoring feature. We've made necessary changes on the Loader module to make it possible to interact with Azure Blob Storage as well.

5.6.3

12 Jul 09:14
Compare
Choose a tag to compare

Starting with this version, Databricks Loader will be able to work with catalog names that contain non-alphanumeric characters like hyphen.

Also, we've bumped a few dependencies for potential security vulnerabilities.

Changelog

  • Databricks Loader: allow any character in catalog name (#1288)
  • Bump nimbus-jose-jwt to 9.31 (#1291)
  • Bump snappy-java to 1.1.10.1 (#1291)
  • Bump json-smart to 2.4.9 (#1291)

5.6.2

10 Jul 10:05
Compare
Choose a tag to compare

Fixes a regression which under rare circumstances caused exceptions like:

Load failed and will not be retried: [Amazon](500310) Invalid operation: cannot alter column "xyz" of relation "com_example_foo_2", target column size should be different; = SqlState: 0A000: [Amazon](500310) Invalid operation: cannot alter column "xyz" of relation "com_example_foo_2", target column size should be different;

Changelog

  • Fix pattern matching on known exception for alter table failures (#1283)

5.6.1

28 Jun 14:16
Compare
Choose a tag to compare

A patch release to address small bugs which crept in with the 5.5.x series. These bugs only affect pipelines using SSH tunnels or pipelines sending failed events to Kinesis from the batch transformer.

Changelog

  • Loader: fix "dispatcher is shutdown" error when setting up SSH tunnel (#1278)
  • Batch transformer: use singleton badrows sink (#1274)
  • Batch transformer: custom iterator returning good data only (#1272)
  • Common: replace release-manager with s3-sync-action (#1152)