Skip to content

Commit

Permalink
bumpversion
Browse files Browse the repository at this point in the history
  • Loading branch information
ChristopheDuong committed Mar 29, 2022
1 parent 0a3e2c0 commit 3ae0de3
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 16 deletions.
2 changes: 1 addition & 1 deletion airbyte-integrations/connectors/destination-s3/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ ENV APPLICATION destination-s3

COPY --from=build /airbyte /airbyte

LABEL io.airbyte.version=0.2.12
LABEL io.airbyte.version=0.2.13
LABEL io.airbyte.name=airbyte/destination-s3
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,9 @@ private static Function<ConfiguredAirbyteStream, WriteConfig> toWriteConfig(
final String streamName = abStream.getName();
final String bucketPath = config.get(BUCKET_PATH_FIELD).asText();
final String customOutputFormat = String.join("/",
bucketPath,
config.has(PATH_FORMAT_FIELD) && !config.get(PATH_FORMAT_FIELD).asText().isBlank() ?
config.get(PATH_FORMAT_FIELD).asText() : S3DestinationConstants.DEFAULT_PATH_FORMAT);
bucketPath,
config.has(PATH_FORMAT_FIELD) && !config.get(PATH_FORMAT_FIELD).asText().isBlank() ? config.get(PATH_FORMAT_FIELD).asText()
: S3DestinationConstants.DEFAULT_PATH_FORMAT);
final String outputBucketPath = storageOperations.getBucketObjectPath(namespace, streamName, SYNC_DATETIME, customOutputFormat);
final DestinationSyncMode syncMode = stream.getDestinationSyncMode();
final WriteConfig writeConfig = new WriteConfig(namespace, streamName, outputBucketPath, syncMode);
Expand Down
27 changes: 15 additions & 12 deletions docs/integrations/destinations/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,39 +22,37 @@ Check out common troubleshooting issues for the S3 destination connector on our
| S3 Endpoint | string | URL to S3, If using AWS S3 just leave blank. |
| S3 Bucket Name | string | Name of the bucket to sync data into. |
| S3 Bucket Path | string | Subdirectory under the above bucket to sync the data into. |
| S3 Bucket Format | string | Additional subdirectories format under S3 Bucket Path. Default value is `${NAMESPACE}/${STREAM_NAME}/` and this can be further customized with variables such as `${YEAR}, ${MONTH}, ${DAY}, ${HOUR} etc` referring to the writing datetime. |
| S3 Region | string | See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) for all region codes. |
| Access Key ID | string | AWS/Minio credential. |
| Secret Access Key | string | AWS/Minio credential. |
| Format | object | Format specific configuration. See the [spec](/airbyte-integrations/connectors/destination-s3/src/main/resources/spec.json) for details. |

⚠️ Please note that under "Full Refresh Sync" mode, data in the configured bucket and path will be wiped out before each sync. We recommend you to provision a dedicated S3 resource for this sync to prevent unexpected data deletion from misconfiguration. ⚠️

The full path of the output data is:
The full path of the output data with S3 path format `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}` is:

```text
<bucket-name>/<sorce-namespace-if-exists>/<stream-name>/<upload-date>-<upload-mills>-<partition-id>.<format-extension>
<bucket-name>/<source-namespace-if-exists>/<stream-name>/<upload-date>/<partition-uuid>.<format-extension>
```

For example:

```text
testing_bucket/data_output_path/public/users/2021_01_01_1609541171643_0.csv
↑ ↑ ↑ ↑ ↑ ↑
| | | | | | | format extension
| | | | | | partition id
| | | | | upload time in millis
| | | | upload date in YYYY-MM-DD
testing_bucket/data_output_path/public/users/2021_01_01/123e4567-e89b-12d3-a456-426614174000.csv.gz
↑ ↑ ↑ ↑ ↑ ↑
| | | | | | format extension
| | | | | |
| | | | | uuid
| | | | upload date in YYYY_MM_DD
| | | stream name
| | source namespace (if it exists)
| bucket path
bucket name
```

Please note that the stream name may contain a prefix, if it is configured on the connection.

The rationales behind this naming pattern are: 1. Each stream has its own directory. 2. The data output files can be sorted by upload time. 3. The upload time composes of a date part and millis part so that it is both readable and unique.

Currently, each data sync will only create one file per stream. In the future, the output file can be partitioned by size. Each partition is identifiable by the partition ID, which is always 0 for now.
A data sync may create multiple files as the output files can be partitioned by size (targeting a size of 200MB compressed or lower) .

## Output Schema

Expand Down Expand Up @@ -133,6 +131,8 @@ With root level normalization, the output CSV is:
| :--- | :--- | :--- | :--- |
| `26d73cde-7eb1-4e1e-b7db-a4c03b4cf206` | 1622135805000 | 123 | `{ "first": "John", "last": "Doe" }` |

Output CSV files will always be compressed using GZIP compression.

### JSON Lines \(JSONL\)

[Json Lines](https://jsonlines.org/) is a text format with one JSON per line. Each line has a structure as follows:
Expand Down Expand Up @@ -173,6 +173,8 @@ They will be like this in the output file:
{ "_airbyte_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_emitted_at": "1631948170000", "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } }
```

Output JSONL files will always be compressed using GZIP compression.

### Parquet

#### Configuration
Expand Down Expand Up @@ -226,6 +228,7 @@ Under the hood, an Airbyte data stream in Json schema is first converted to an A

| Version | Date | Pull Request | Subject |
|:--------| :--- | :--- |:---------------------------------------------------------------------------------------------------------------------------|
| 0.2.13 | 2022-03-29 | [\#11496](https://github.com/airbytehq/airbyte/pull/11496) | Fix S3 bucket path to be included with S3 bucket format |
| 0.2.12 | 2022-03-28 | [\#11294](https://github.com/airbytehq/airbyte/pull/11294) | Change to serialized buffering strategy to reduce memory consumption |
| 0.2.11 | 2022-03-23 | [\#11173](https://github.com/airbytehq/airbyte/pull/11173) | Added support for AWS Glue crawler |
| 0.2.10 | 2022-03-07 | [\#10856](https://github.com/airbytehq/airbyte/pull/10856) | `check` method now tests for listObjects permissions on the target bucket |
Expand Down

1 comment on commit 3ae0de3

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SonarQube Report

SonarQube report for Airbyte Connectors Destination S3(#11496)

Measures

Name Value Name Value Name Value
Lines of Code 2744 Duplicated Blocks 3 Lines to Cover 2
Code Smells 45 Quality Gate Status OK Reliability Rating A
Security Rating A Bugs 1 Duplicated Lines (%) 0.0
Coverage 0.0 Vulnerabilities 0 Blocker Issues 0
Critical Issues 3 Major Issues 40 Minor Issues 3

Detected Issues

Rule File Description Message
java:S1172 (MAJOR) s3/S3ConsumerFactory.java:76 Unused method parameters should be removed Remove this unused method parameter "namingResolver".
java:S1611 (MINOR) s3/S3ConsumerFactory.java:152 Parentheses should be removed from a single lambda input parameter when its type is inferred Remove the parentheses around the "hasFailed" parameter (sonar.java.source not set. Assuming 8 or greater.)
java:S1118 (MAJOR) s3/SerializedBufferFactory.java:26 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S112 (MAJOR) s3/SerializedBufferFactory.java:71 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) avro/AvroSerializedBuffer.java:32 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
common-java:DuplicatedBlocks (MAJOR) avro/S3AvroWriter.java Source files should not have any duplicated blocks 1 duplicated blocks of code must be removed.
common-java:DuplicatedBlocks (MAJOR) csv/S3CsvWriter.java Source files should not have any duplicated blocks 1 duplicated blocks of code must be removed.
java:S112 (MAJOR) jsonl/JsonLSerializedBuffer.java:31 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S1172 (MAJOR) jsonl/JsonLSerializedBuffer.java:61 Unused method parameters should be removed Remove this unused method parameter "config".
common-java:DuplicatedBlocks (MAJOR) jsonl/S3JsonlWriter.java Source files should not have any duplicated blocks 1 duplicated blocks of code must be removed.
java:S1118 (MAJOR) util/StreamTransferManagerHelper.java:12 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S3358 (MAJOR) avro/JsonToAvroSchemaConverter.java:179 Ternary operators should not be nested Extract this nested ternary operation into an independent statement.
java:S112 (MAJOR) s3/BlobStorageOperations.java:20 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/BlobStorageOperations.java:27 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/BlobStorageOperations.java:32 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/S3StorageOperations.java:100 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/S3StorageOperations.java:121 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) s3/S3StorageOperations.java:131 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) csv/CsvSerializedBuffer.java:33 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) csv/CsvSerializedBuffer.java:47 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S107 (MAJOR) s3/S3DestinationConfig.java:80 Methods should not have too many parameters Constructor has 8 parameters, which is greater than 7 authorized.
java:S6213 (MAJOR) avro/S3AvroWriter.java:106 Restricted Identifiers should not be used as Identifiers Rename this variable to not match a restricted identifier.
java:S112 (MAJOR) s3/S3DestinationConfig.java:175 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S2259 (MAJOR) avro/JsonToAvroSchemaConverter.java:164 Null pointers should not be dereferenced A "NullPointerException" could be thrown; "properties" is nullable here.
java:S3776 (CRITICAL) avro/JsonToAvroSchemaConverter.java:338 Cognitive Complexity of methods should not be too high Refactor this method to reduce its Cognitive Complexity from 22 to the 15 allowed.
java:S3252 (CRITICAL) avro/JsonToAvroSchemaConverter.java:372 "static" base class members should not be accessed via derived types Use static access with "!Unknown!" for "Entry".
java:S1121 (MAJOR) avro/JsonToAvroSchemaConverter.java:261 Assignments should not be made from within sub-expressions Extract the assignment out of this expression.
java:S107 (MAJOR) s3/S3DestinationConfig.java:60 Methods should not have too many parameters Constructor has 9 parameters, which is greater than 7 authorized.
java:S107 (MAJOR) csv/S3CsvWriter.java:40 Methods should not have too many parameters Constructor has 9 parameters, which is greater than 7 authorized.
java:S3358 (MAJOR) s3/S3Destination.java:136 Ternary operators should not be nested Extract this nested ternary operation into an independent statement.
java:S5361 (CRITICAL) s3/S3Destination.java:137 "String#replace" should be preferred to "String#replaceAll" Replace this call to "replaceAll()" by a call to the "replace()" method.
java:S1121 (MAJOR) avro/JsonToAvroSchemaConverter.java:217 Assignments should not be made from within sub-expressions Extract the assignment out of this expression.
java:S1118 (MAJOR) avro/AvroConstants.java:10 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S125 (MAJOR) avro/JsonToAvroSchemaConverter.java:209 Sections of code should not be commented out This block of commented-out lines of code should be removed.
java:S1068 (MAJOR) jsonl/S3JsonlWriter.java:37 Unused "private" fields should be removed Remove this unused "WRITER" private field.
java:S1118 (MAJOR) util/AvroRecordHelper.java:17 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S1700 (MAJOR) avro/JsonSchemaType.java:23 A field should not duplicate the name of its containing class Rename field "jsonSchemaType"
java:S1118 (MAJOR) parquet/S3ParquetConstants.java:9 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S1118 (MAJOR) util/S3OutputPathHelper.java:13 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S112 (MAJOR) writer/ProductionWriterFactory.java:59 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S112 (MAJOR) writer/S3WriterFactory.java:21 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S1118 (MAJOR) s3/S3FormatConfigs.java:16 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S112 (MAJOR) s3/S3FormatConfigs.java:39 Generic exceptions should never be thrown Define and throw a dedicated exception instead of using a generic one.
java:S1118 (MAJOR) csv/CsvSheetGenerator.java:25 Utility classes should not have public constructors Add a private constructor to hide the implicit public one.
java:S2094 (MINOR) csv/CsvSheetGenerators.java:7 Classes should not be empty Remove this empty class, write its code or make it an "interface".
java:S1116 (MINOR) csv/RootLevelFlatteningSheetGenerator.java:25 Empty statements should be removed Remove this empty statement.

Coverage (0.0%)

File Coverage File Coverage
src/main/java/io/airbyte/integrations/destination/s3/avro/AvroConstants.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/avro/AvroNameTransformer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/avro/AvroRecordFactory.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/avro/AvroSerializedBuffer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/avro/JsonFieldNameUpdater.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/avro/JsonSchemaType.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/avro/JsonToAvroSchemaConverter.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/avro/S3AvroFormatConfig.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/avro/S3AvroWriter.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/csv/BaseSheetGenerator.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/csv/CsvSerializedBuffer.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/csv/CsvSheetGenerator.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/csv/NoFlatteningSheetGenerator.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/csv/RootLevelFlatteningSheetGenerator.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvFormatConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvWriter.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/csv/StagingDatabaseCsvSheetGenerator.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/jsonl/JsonLSerializedBuffer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/jsonl/S3JsonlFormatConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/jsonl/S3JsonlWriter.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/parquet/ParquetSerializedBuffer.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/parquet/S3ParquetConstants.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/parquet/S3ParquetFormatConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/parquet/S3ParquetWriter.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3ConsumerFactory.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/S3Destination.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3DestinationConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/S3DestinationConfigFactory.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3DestinationConstants.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/S3Format.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3FormatConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/S3FormatConfigs.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/S3StorageOperations.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/SerializedBufferFactory.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/util/AvroRecordHelper.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/util/S3NameTransformer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/util/S3OutputPathHelper.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/util/StreamTransferManagerHelper.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/WriteConfig.java 0.0 src/main/java/io/airbyte/integrations/destination/s3/writer/BaseS3Writer.java 0.0
src/main/java/io/airbyte/integrations/destination/s3/writer/ProductionWriterFactory.java 0.0

Please sign in to comment.