-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
🎉 New Destination: Google Cloud Storage (#4784)
* Adding Google Cloud Storage as destination * Removed few comments and amended the version * Added documentation in docs/integrations/destinations/gcs.md * Amended gcs.md with the right pull id * Implemented all the fixes requested by tuliren as per #4329 * Renaming all the files * Branch alligned to S3 0.1.7 (with Avro and Jsonl). Removed redundant file by making S3 a dependency for GCS * Removed some additional duplicates between GCS and S3 * Revert changes in the root files * Revert jdbc files * Fix package names * Refactor gcs config * Format code * Fix gcs connection * Format code * Add acceptance tests * Fix parquet acceptance test * Add ci credentials * Register the connector and update documentations * Fix typo * Format code * Add unit test * Add comments * Update readme Co-authored-by: Sherif A. Nada <[email protected]> Co-authored-by: Marco Fontana <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Marco Fontana <[email protected]> Co-authored-by: Sherif A. Nada <[email protected]>
- Loading branch information
Showing
40 changed files
with
2,787 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7 changes: 7 additions & 0 deletions
7
...esources/config/STANDARD_DESTINATION_DEFINITION/ca8f6566-e555-4b40-943a-545bf123117a.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"destinationDefinitionId": "ca8f6566-e555-4b40-943a-545bf123117a", | ||
"name": "Google Cloud Storage (GCS)", | ||
"dockerRepository": "airbyte/destination-gcs", | ||
"dockerImageTag": "0.1.0", | ||
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/gcs" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 changes: 3 additions & 0 deletions
3
airbyte-integrations/connectors/destination-gcs/.dockerignore
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
* | ||
!Dockerfile | ||
!build |
11 changes: 11 additions & 0 deletions
11
airbyte-integrations/connectors/destination-gcs/Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
FROM airbyte/integration-base-java:dev | ||
|
||
WORKDIR /airbyte | ||
ENV APPLICATION destination-gcs | ||
|
||
COPY build/distributions/${APPLICATION}*.tar ${APPLICATION}.tar | ||
|
||
RUN tar xf ${APPLICATION}.tar --strip-components=1 | ||
|
||
LABEL io.airbyte.version=0.1.0 | ||
LABEL io.airbyte.name=airbyte/destination-gcs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Destination Google Cloud Storage (GCS) | ||
|
||
In order to test the D3 destination, you need an Google Cloud Platform account. | ||
|
||
## Community Contributor | ||
|
||
As a community contributor, you can follow these steps to run integration tests. | ||
|
||
- Create an GCS bucket for testing. | ||
- Generate a [HMAC key](https://cloud.google.com/storage/docs/authentication/hmackeys) for the bucket with reading and writing permissions. Please note that currently only the HMAC key credential is supported. More credential types will be added in the future. | ||
- Paste the bucket and key information into the config files under [`./sample_secrets`](./sample_secrets). | ||
- Rename the directory from `sample_secrets` to `secrets`. | ||
- Feel free to modify the config files with different settings in the acceptance test file (e.g. `GcsCsvDestinationAcceptanceTest.java`, method `getFormatConfig`), as long as they follow the schema defined in [spec.json](src/main/resources/spec.json). | ||
|
||
## Airbyte Employee | ||
|
||
- Access the `destination gcs creds` secrets on Last Pass, and put it in `sample_secrets/config.json`. | ||
- Rename the directory from `sample_secrets` to `secrets`. | ||
|
||
## Add New Output Format | ||
- Add a new enum in `S3Format`. | ||
- Modify `spec.json` to specify the configuration of this new format. | ||
- Update `S3FormatConfigs` to be able to construct a config for this new format. | ||
- Create a new package under `io.airbyte.integrations.destination.gcs`. | ||
- Implement a new `GcsWriter`. The implementation can extend `BaseGcsWriter`. | ||
- Write an acceptance test for the new output format. The test can extend `GcsDestinationAcceptanceTest`. |
38 changes: 38 additions & 0 deletions
38
airbyte-integrations/connectors/destination-gcs/build.gradle
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
plugins { | ||
id 'application' | ||
id 'airbyte-docker' | ||
id 'airbyte-integration-test-java' | ||
} | ||
|
||
application { | ||
mainClass = 'io.airbyte.integrations.destination.gcs.GcsDestination' | ||
} | ||
|
||
dependencies { | ||
implementation project(':airbyte-config:models') | ||
implementation project(':airbyte-protocol:models') | ||
implementation project(':airbyte-integrations:bases:base-java') | ||
implementation project(':airbyte-integrations:connectors:destination-jdbc') | ||
implementation project(':airbyte-integrations:connectors:destination-s3') | ||
implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs) | ||
|
||
implementation platform('com.amazonaws:aws-java-sdk-bom:1.12.14') | ||
implementation 'com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.1' | ||
|
||
// csv | ||
implementation 'com.amazonaws:aws-java-sdk-s3:1.11.978' | ||
implementation 'org.apache.commons:commons-csv:1.4' | ||
implementation 'com.github.alexmojaki:s3-stream-upload:2.2.2' | ||
|
||
// parquet | ||
implementation group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.3.0' | ||
implementation group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.3.0' | ||
implementation group: 'org.apache.hadoop', name: 'hadoop-mapreduce-client-core', version: '3.3.0' | ||
implementation group: 'org.apache.parquet', name: 'parquet-avro', version: '1.12.0' | ||
implementation group: 'tech.allegro.schema.json2avro', name: 'converter', version: '0.2.10' | ||
|
||
testImplementation 'org.apache.commons:commons-lang3:3.11' | ||
|
||
integrationTestJavaImplementation project(':airbyte-integrations:bases:standard-destination-test') | ||
integrationTestJavaImplementation project(':airbyte-integrations:connectors:destination-gcs') | ||
} |
10 changes: 10 additions & 0 deletions
10
airbyte-integrations/connectors/destination-gcs/sample_secrets/config.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{ | ||
"gcs_bucket_name": "<bucket-name>", | ||
"gcs_bucket_path": "integration-test", | ||
"gcs_bucket_region": "<region>", | ||
"credential": { | ||
"credential_type": "HMAC_KEY", | ||
"hmac_key_access_id": "<access-id>", | ||
"hmac_key_secret": "<secret>" | ||
} | ||
} |
119 changes: 119 additions & 0 deletions
119
...rs/destination-gcs/src/main/java/io/airbyte/integrations/destination/gcs/GcsConsumer.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
/* | ||
* MIT License | ||
* | ||
* Copyright (c) 2020 Airbyte | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a copy | ||
* of this software and associated documentation files (the "Software"), to deal | ||
* in the Software without restriction, including without limitation the rights | ||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
* copies of the Software, and to permit persons to whom the Software is | ||
* furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice shall be included in all | ||
* copies or substantial portions of the Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
* SOFTWARE. | ||
*/ | ||
|
||
package io.airbyte.integrations.destination.gcs; | ||
|
||
import com.amazonaws.services.s3.AmazonS3; | ||
import io.airbyte.commons.json.Jsons; | ||
import io.airbyte.integrations.base.AirbyteStreamNameNamespacePair; | ||
import io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer; | ||
import io.airbyte.integrations.destination.gcs.writer.GcsWriterFactory; | ||
import io.airbyte.integrations.destination.s3.writer.S3Writer; | ||
import io.airbyte.protocol.models.AirbyteMessage; | ||
import io.airbyte.protocol.models.AirbyteMessage.Type; | ||
import io.airbyte.protocol.models.AirbyteRecordMessage; | ||
import io.airbyte.protocol.models.AirbyteStream; | ||
import io.airbyte.protocol.models.ConfiguredAirbyteCatalog; | ||
import io.airbyte.protocol.models.ConfiguredAirbyteStream; | ||
import java.sql.Timestamp; | ||
import java.util.HashMap; | ||
import java.util.Map; | ||
import java.util.UUID; | ||
import java.util.function.Consumer; | ||
|
||
public class GcsConsumer extends FailureTrackingAirbyteMessageConsumer { | ||
|
||
private final GcsDestinationConfig gcsDestinationConfig; | ||
private final ConfiguredAirbyteCatalog configuredCatalog; | ||
private final GcsWriterFactory writerFactory; | ||
private final Consumer<AirbyteMessage> outputRecordCollector; | ||
private final Map<AirbyteStreamNameNamespacePair, S3Writer> streamNameAndNamespaceToWriters; | ||
|
||
private AirbyteMessage lastStateMessage = null; | ||
|
||
public GcsConsumer(GcsDestinationConfig gcsDestinationConfig, | ||
ConfiguredAirbyteCatalog configuredCatalog, | ||
GcsWriterFactory writerFactory, | ||
Consumer<AirbyteMessage> outputRecordCollector) { | ||
this.gcsDestinationConfig = gcsDestinationConfig; | ||
this.configuredCatalog = configuredCatalog; | ||
this.writerFactory = writerFactory; | ||
this.outputRecordCollector = outputRecordCollector; | ||
this.streamNameAndNamespaceToWriters = new HashMap<>(configuredCatalog.getStreams().size()); | ||
} | ||
|
||
@Override | ||
protected void startTracked() throws Exception { | ||
AmazonS3 s3Client = GcsS3Helper.getGcsS3Client(gcsDestinationConfig); | ||
|
||
Timestamp uploadTimestamp = new Timestamp(System.currentTimeMillis()); | ||
|
||
for (ConfiguredAirbyteStream configuredStream : configuredCatalog.getStreams()) { | ||
S3Writer writer = writerFactory | ||
.create(gcsDestinationConfig, s3Client, configuredStream, uploadTimestamp); | ||
writer.initialize(); | ||
|
||
AirbyteStream stream = configuredStream.getStream(); | ||
AirbyteStreamNameNamespacePair streamNamePair = AirbyteStreamNameNamespacePair | ||
.fromAirbyteSteam(stream); | ||
streamNameAndNamespaceToWriters.put(streamNamePair, writer); | ||
} | ||
} | ||
|
||
@Override | ||
protected void acceptTracked(AirbyteMessage airbyteMessage) throws Exception { | ||
if (airbyteMessage.getType() == Type.STATE) { | ||
this.lastStateMessage = airbyteMessage; | ||
return; | ||
} else if (airbyteMessage.getType() != Type.RECORD) { | ||
return; | ||
} | ||
|
||
AirbyteRecordMessage recordMessage = airbyteMessage.getRecord(); | ||
AirbyteStreamNameNamespacePair pair = AirbyteStreamNameNamespacePair | ||
.fromRecordMessage(recordMessage); | ||
|
||
if (!streamNameAndNamespaceToWriters.containsKey(pair)) { | ||
throw new IllegalArgumentException( | ||
String.format( | ||
"Message contained record from a stream that was not in the catalog. \ncatalog: %s , \nmessage: %s", | ||
Jsons.serialize(configuredCatalog), Jsons.serialize(recordMessage))); | ||
} | ||
|
||
UUID id = UUID.randomUUID(); | ||
streamNameAndNamespaceToWriters.get(pair).write(id, recordMessage); | ||
} | ||
|
||
@Override | ||
protected void close(boolean hasFailed) throws Exception { | ||
for (S3Writer handler : streamNameAndNamespaceToWriters.values()) { | ||
handler.close(hasFailed); | ||
} | ||
// Gcs stream uploader is all or nothing if a failure happens in the destination. | ||
if (!hasFailed) { | ||
outputRecordCollector.accept(lastStateMessage); | ||
} | ||
} | ||
|
||
} |
76 changes: 76 additions & 0 deletions
76
...destination-gcs/src/main/java/io/airbyte/integrations/destination/gcs/GcsDestination.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
/* | ||
* MIT License | ||
* | ||
* Copyright (c) 2020 Airbyte | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a copy | ||
* of this software and associated documentation files (the "Software"), to deal | ||
* in the Software without restriction, including without limitation the rights | ||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
* copies of the Software, and to permit persons to whom the Software is | ||
* furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice shall be included in all | ||
* copies or substantial portions of the Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
* SOFTWARE. | ||
*/ | ||
|
||
package io.airbyte.integrations.destination.gcs; | ||
|
||
import com.amazonaws.services.s3.AmazonS3; | ||
import com.fasterxml.jackson.databind.JsonNode; | ||
import io.airbyte.integrations.BaseConnector; | ||
import io.airbyte.integrations.base.AirbyteMessageConsumer; | ||
import io.airbyte.integrations.base.Destination; | ||
import io.airbyte.integrations.base.IntegrationRunner; | ||
import io.airbyte.integrations.destination.gcs.writer.GcsWriterFactory; | ||
import io.airbyte.integrations.destination.gcs.writer.ProductionWriterFactory; | ||
import io.airbyte.protocol.models.AirbyteConnectionStatus; | ||
import io.airbyte.protocol.models.AirbyteConnectionStatus.Status; | ||
import io.airbyte.protocol.models.AirbyteMessage; | ||
import io.airbyte.protocol.models.ConfiguredAirbyteCatalog; | ||
import java.util.function.Consumer; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class GcsDestination extends BaseConnector implements Destination { | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(GcsDestination.class); | ||
|
||
public static void main(String[] args) throws Exception { | ||
new IntegrationRunner(new GcsDestination()).run(args); | ||
} | ||
|
||
@Override | ||
public AirbyteConnectionStatus check(JsonNode config) { | ||
try { | ||
GcsDestinationConfig destinationConfig = GcsDestinationConfig.getGcsDestinationConfig(config); | ||
AmazonS3 s3Client = GcsS3Helper.getGcsS3Client(destinationConfig); | ||
s3Client.putObject(destinationConfig.getBucketName(), "test", "check-content"); | ||
s3Client.deleteObject(destinationConfig.getBucketName(), "test"); | ||
return new AirbyteConnectionStatus().withStatus(Status.SUCCEEDED); | ||
} catch (Exception e) { | ||
LOGGER.error("Exception attempting to access the Gcs bucket: {}", e.getMessage()); | ||
return new AirbyteConnectionStatus() | ||
.withStatus(AirbyteConnectionStatus.Status.FAILED) | ||
.withMessage("Could not connect to the Gcs bucket with the provided configuration. \n" + e | ||
.getMessage()); | ||
} | ||
} | ||
|
||
@Override | ||
public AirbyteMessageConsumer getConsumer(JsonNode config, | ||
ConfiguredAirbyteCatalog configuredCatalog, | ||
Consumer<AirbyteMessage> outputRecordCollector) { | ||
GcsWriterFactory formatterFactory = new ProductionWriterFactory(); | ||
return new GcsConsumer(GcsDestinationConfig.getGcsDestinationConfig(config), configuredCatalog, formatterFactory, outputRecordCollector); | ||
} | ||
|
||
} |
Oops, something went wrong.