-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add high level overview to normalization doc. #6445
Conversation
@@ -50,6 +44,24 @@ The [normalization rules](basic-normalization.md#Rules) are _not_ configurable. | |||
|
|||
Airbyte places the json blob version of your data in a table called `_airbyte_raw_<stream name>`. If basic normalization is turned on, it will place a separate copy of the data in a table called `<stream name>`. Under the hood, Airbyte is using dbt, which means that the data only ingresses into the data store one time. The normalization happens as a query within the datastore. This implementation avoids extra network time and costs. | |||
|
|||
## Why does Airbyte have Basic Normalization? | |||
|
|||
At its core, Airbyte is geared to handle the EL \(Extract Load\) steps of an ELT process. These steps can also be referred in Airbyte's dialect as "Source" and "Destination". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to explain why the raw table exists since that is something we get questions about a lot.
e.g. (you can word it better) A core tenant of the ELT approach is that the E and L steps mutate the data as little as possible. By getting a copy of the unmodified data into the destination, we reduce the need for resending data in the future, because the "original" data is already in the destination. If you change your mind on how you want to materialize the data, Airbyte can use the untouched raw version that is already in the destination to do it and doesn't need to resend anything.
(of course we do actually resend data in a lot of cases right now, but aspirationally this is what we are going for and why we adhere to this philosophy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This absolutely makes sense and I think it's good to explain why it exists. I've included a short explanation on the philosophy.
Abhi Vaidyanatha seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Main Changes
Basic Normalization
doc a little more readable to first-time deployers.