Skip to content

Commit

Permalink
Destination doc and warning updates (airbytehq#20110)
Browse files Browse the repository at this point in the history
* Doc updates

* Bigquery Denormalized

* bump faker for change

* ignore missing strict-encrypt connectors from ci check

* Apply suggestions from code review

Co-authored-by: Augustin <[email protected]>

* Fix MD titles

Co-authored-by: Augustin <[email protected]>
  • Loading branch information
evantahler and alafanechere authored Dec 6, 2022
1 parent 779f275 commit 92ad0fd
Show file tree
Hide file tree
Showing 16 changed files with 391 additions and 323 deletions.
1 change: 1 addition & 0 deletions airbyte-integrations/connectors/source-faker/main.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#
# Copyright (c) 2022 Airbyte, Inc., all rights reserved.
#
#


import sys
Expand Down
3 changes: 3 additions & 0 deletions docs/integrations/destinations/bigquery-denormalized.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Bigquery Denormalized

See [destinations/bigquery](/integrations/destinations/bigquery)
52 changes: 26 additions & 26 deletions docs/integrations/destinations/cassandra.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,30 @@
# Cassandra

## Prerequisites
- For Airbyte Open Source users using the [Postgres](https://docs.airbyte.com/integrations/sources/postgres) source connector, [upgrade](https://docs.airbyte.com/operator-guides/upgrading-airbyte/) your Airbyte platform to version `v0.40.0-alpha` or newer and upgrade your Cassandra connector to version `0.1.3` or newer

- For Airbyte Open Source users using the [Postgres](https://docs.airbyte.com/integrations/sources/postgres) source connector, [upgrade](https://docs.airbyte.com/operator-guides/upgrading-airbyte/) your Airbyte platform to version `v0.40.0-alpha` or newer and upgrade your Cassandra connector to version `0.1.3` or newer

## Sync overview

### Output schema

The incoming airbyte data is structured in keyspaces and tables and is partitioned and replicated across different nodes
in the cluster. This connector maps an incoming `stream` to a Cassandra `table` and a `namespace` to a
Cassandra`keyspace`. Fields in the airbyte message become different columns in the Cassandra tables. Each table will
Cassandra`keyspace`. Fields in the airbyte message become different columns in the Cassandra tables. Each table will
contain the following columns.

* `_airbyte_ab_id`: A random uuid generator to be used as a partition key.
* `_airbyte_emitted_at`: a timestamp representing when the event was received from the data source.
* `_airbyte_data`: a json text representing the extracted data.
- `_airbyte_ab_id`: A random uuid generator to be used as a partition key.
- `_airbyte_emitted_at`: a timestamp representing when the event was received from the data source.
- `_airbyte_data`: a json text representing the extracted data.

### Features

| Feature | Support | Notes |
| :--- | :---: | :--- |
| Full Refresh Sync || Warning: this mode deletes all previously synced data in the configured DynamoDB table. |
| Incremental - Append Sync || |
| Incremental - Deduped History || As this connector does not support dbt, we don't support this sync mode on this destination. |
| Namespaces || Namespace will be used as part of the table name. |


| Feature | Support | Notes |
| :---------------------------- | :-----: | :------------------------------------------------------------------------------------------- |
| Full Refresh Sync || Warning: this mode deletes all previously synced data in the configured DynamoDB table. |
| Incremental - Append Sync || |
| Incremental - Deduped History || As this connector does not support dbt, we don't support this sync mode on this destination. |
| Namespaces || Namespace will be used as part of the table name. |

### Performance considerations

Expand All @@ -38,16 +36,18 @@ data from the connector.

### Requirements

* The driver is compatible with _Cassandra >= 2.1_
* Configuration
* Keyspace [default keyspace to use when writing data]
* Username [authentication username]
* Password [authentication password]
* Address [cluster address]
* Port [default: 9042]
* Datacenter [optional] [default: datacenter1]
* Replication [optional] [default: 1]

### Setup guide

######TODO: more info, screenshots?, etc...
- The driver is compatible with _Cassandra >= 2.1_
- Configuration
- Keyspace [default keyspace to use when writing data]
- Username [authentication username]
- Password [authentication password]
- Address [cluster address]
- Port [default: 9042]
- Datacenter [optional] [default: datacenter1]
- Replication [optional] [default: 1]

## Changelog

| Version | Date | Pull Request | Subject |
| :------ | :--------- | :------------------------------------------------------- | :------------------------------------- |
| 0.1.4 | 2022-08-23 | [15894](https://github.com/airbytehq/airbyte/pull/15894) | Replace batch insert with async method |
9 changes: 9 additions & 0 deletions docs/integrations/destinations/csv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# CSV Destination

The Airbyte Destination for CSV files.

## Changelog

| Version | Date | Pull Request | Subject |
| :------ | :--------- | :------------------------------------------------------- | :----------- |
| 0.2.10 | 2022-08-08 | [13932](https://github.com/airbytehq/airbyte/pull/13932) | Bump version |
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Azure Blob Storage

The Airbyte Destination for [Microsoft Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs/)

## Changelog

| Version | Date | Pull Request | Subject |
| :------ | :--------- | :------------------------------------------------------- | :----------- |
| 0.1.6 | 2022-08-08 | [15412](https://github.com/airbytehq/airbyte/pull/15412) | Bump version |
9 changes: 9 additions & 0 deletions docs/integrations/destinations/dev-null.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Dev Null Destination

The Airbyte `dev-null` Destination. This destination is for testing and debugging only.

## Changelog

| Version | Date | Pull Request | Subject |
| :------ | :--------- | :------------------------------------------------------- | :----------- |
| 0.2.7 | 2022-08-08 | [13932](https://github.com/airbytehq/airbyte/pull/13932) | Bump version |
51 changes: 30 additions & 21 deletions docs/integrations/destinations/doris.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,60 @@
# Doris

destination-doris is a destination implemented based on [Doris stream load](https://doris.apache.org/docs/dev/data-operate/import/import-way/stream-load-manual), supports batch rollback, and uses http/https put request
destination-doris is a destination implemented based on [Apache Doris stream load](https://doris.apache.org/docs/dev/data-operate/import/import-way/stream-load-manual), supports batch rollback, and uses http/https put request

## Sync overview

### Output schema

Each stream will be output into its own table in Doris. Each table will contain 3 columns:

* `_airbyte_ab_id`: an uuid assigned by Airbyte to each event that is processed. The column type in Doris is `VARCHAR(40)`.
* `_airbyte_emitted_at`: a timestamp representing when the event was pulled from the data source. The column type in Doris is `BIGINT`.
* `_airbyte_data`: a json blob representing with the event data. The column type in Doris is `String`.
- `_airbyte_ab_id`: an uuid assigned by Airbyte to each event that is processed. The column type in Doris is `VARCHAR(40)`.
- `_airbyte_emitted_at`: a timestamp representing when the event was pulled from the data source. The column type in Doris is `BIGINT`.
- `_airbyte_data`: a json blob representing with the event data. The column type in Doris is `String`.

### Features

This section should contain a table with the following format:

| Feature | Supported?(Yes/No) | Notes |
| :--- |:-------------------| :--- |
| Full Refresh Sync | Yes | |
| Incremental - Append Sync | Yes | |
| Incremental - Deduped History | No | it will soon be realized |
| For databases, WAL/Logical replication | Yes | |
| Feature | Supported?(Yes/No) | Notes |
| :------------------------------------- | :----------------- | :----------------------- |
| Full Refresh Sync | Yes | |
| Incremental - Append Sync | Yes | |
| Incremental - Deduped History | No | it will soon be realized |
| For databases, WAL/Logical replication | Yes | |

### Performance considerations

Batch writes are performed. mini records may impact performance.
Batch writes are performed. mini records may impact performance.
Importing multiple tables will generate multiple [Doris stream load](https://doris.apache.org/docs/dev/data-operate/import/import-way/stream-load-manual) transactions, which should be split as much as possible.

## Getting started

### Requirements

To use the Doris destination, you'll need:
* A Doris server version 0.14 or above
* Make sure your Doris fe http port can be accessed by Airbyte.
* Make sure your Doris database host can be accessed by Airbyte.
* Make sure your Doris user with read/write permissions on certain tables.

- A Doris server version 0.14 or above
- Make sure your Doris fe http port can be accessed by Airbyte.
- Make sure your Doris database host can be accessed by Airbyte.
- Make sure your Doris user with read/write permissions on certain tables.

### Target Database and tables

You will need to choose a database that will be used to store synced data from Airbyte.
You need to prepare tables that will be used to store synced data from Airbyte, and ensure the order and matching of the column names in the table as much as possible.

### Setup the access parameters
* **Host**
* **HttpPort**
* **QueryPort**
* **Username**
* **Password**
* **Database**

- **Host**
- **HttpPort**
- **QueryPort**
- **Username**
- **Password**
- **Database**

## Changelog

| Version | Date | Pull Request | Subject |
| :------ | :--------- | :------------------------------------------------------- | :------------- |
| 0.1.0 | 2022-11-14 | [17884](https://github.com/airbytehq/airbyte/pull/17884) | Initial Commit |
48 changes: 25 additions & 23 deletions docs/integrations/destinations/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,48 +11,50 @@ in the cluster. This connector maps an incoming `stream` to an Iceberg `table` a
Iceberg `database`. Fields in the airbyte message become different columns in the Iceberg tables. Each table will
contain the following columns.

* `_airbyte_ab_id`: A random generated uuid.
* `_airbyte_emitted_at`: a timestamp representing when the event was received from the data source.
* `_airbyte_data`: a json text representing the extracted data.
- `_airbyte_ab_id`: A random generated uuid.
- `_airbyte_emitted_at`: a timestamp representing when the event was received from the data source.
- `_airbyte_data`: a json text representing the extracted data.

### Features

This section should contain a table with the following format:

| Feature | Supported?(Yes/No) | Notes |
| :--- | :--- | :--- |
| Full Refresh Sync || |
| Incremental Sync || |
| Replicate Incremental Deletes || |
| SSH Tunnel Support || |
| Feature | Supported?(Yes/No) | Notes |
| :---------------------------- | :----------------- | :---- |
| Full Refresh Sync | | |
| Incremental Sync | | |
| Replicate Incremental Deletes | | |
| SSH Tunnel Support | | |

### Performance considerations

Every ten thousand pieces of incoming airbyte data in a stream ————we call it a batch, would produce one data file(
Parquet/Avro) in an Iceberg table. This batch size can be configurabled by `Data file flushing batch size`
property.
property.
As the quantity of Iceberg data files grows, it causes an unnecessary amount of metadata and less efficient queries from
file open costs.
file open costs.
Iceberg provides data file compaction action to improve this case, you can read more about
compaction [HERE](https://iceberg.apache.org/docs/latest/maintenance/#compact-data-files).
compaction [HERE](https://iceberg.apache.org/docs/latest/maintenance/#compact-data-files).
This connector also provides auto compact action when stream closes, by `Auto compact data files` property. Any you can
specify the target size of compacted Iceberg data file.

## Getting started

### Requirements

* **Iceberg catalog** : Iceberg uses `catalog` to manage tables. this connector already supports:
* [HiveCatalog](https://iceberg.apache.org/docs/latest/hive/#global-hive-catalog) connects to a **Hive metastore**
to keep track of Iceberg tables.
* [HadoopCatalog](https://iceberg.apache.org/docs/latest/java-api-quickstart/#using-a-hadoop-catalog) doesn’t need
to connect to a Hive MetaStore, but can only be used with **HDFS or similar file systems** that support atomic
rename. For `HadoopCatalog`, this connector use **Storage Config** (S3 or HDFS) to manage Iceberg tables.
* [JdbcCatalog](https://iceberg.apache.org/docs/latest/jdbc/) uses a table in a relational database to manage
Iceberg tables through JDBC. So far, this connector supports **PostgreSQL** only.
* **Storage medium** means where Iceberg data files storages in. So far, this connector supports **S3/S3N/S3N**
- **Iceberg catalog** : Iceberg uses `catalog` to manage tables. this connector already supports:
- [HiveCatalog](https://iceberg.apache.org/docs/latest/hive/#global-hive-catalog) connects to a **Hive metastore**
to keep track of Iceberg tables.
- [HadoopCatalog](https://iceberg.apache.org/docs/latest/java-api-quickstart/#using-a-hadoop-catalog) doesn’t need
to connect to a Hive MetaStore, but can only be used with **HDFS or similar file systems** that support atomic
rename. For `HadoopCatalog`, this connector use **Storage Config** (S3 or HDFS) to manage Iceberg tables.
- [JdbcCatalog](https://iceberg.apache.org/docs/latest/jdbc/) uses a table in a relational database to manage
Iceberg tables through JDBC. So far, this connector supports **PostgreSQL** only.
- **Storage medium** means where Iceberg data files storages in. So far, this connector supports **S3/S3N/S3N**
object-storage only.

### Setup guide
## Changelog

######TODO: more info
| Version | Date | Pull Request | Subject |
| :------ | :--------- | :------------------------------------------------------- | :------------- |
| 0.1.0 | 2022-11-01 | [18836](https://github.com/airbytehq/airbyte/pull/18836) | Initial Commit |
Loading

0 comments on commit 92ad0fd

Please sign in to comment.