Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first draft of digram + extract, normalize, load description #84

Merged
merged 3 commits into from
Nov 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion docs/website/docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,29 @@ sidebar_position: 4

# Architecture

`dlt` automatically turns JSON returned by any [source](./glossary.md#source) (e.g. an API)
into a live dataset stored in the [destination](./glossary.md#destination) of your choice
(e.g. Google BigQuery). It does this by first [extracting](./architecture.md#extract) the JSON data,
then [normalizing](./architecture.md#normalize) it to a schema, and finally [loading](./architecture#load)
it to the location where you will store it.

![architecture-diagram](/img/architecture-diagram.png)

TyDunn marked this conversation as resolved.
Show resolved Hide resolved
## Extract

The Python script requests data from an API or a similar [source](./glossary.md#source). Once this data
is recieved, the script parses the JSON and provides it to `dlt` as input, which then normalizes that data.

## Normalize

## Load
The configurable normalization engine in `dlt` recursively unpacks this nested structure into
relational tables (i.e. inferring data types, linking tables to create parent-child relationships,
etc.), making it ready to be loaded. This creates a [schema](./glossary.md#schema), which will
automatically evolve to any future source data changes (e.g. new fields or tables).

## Load

The data is then loaded into your chosen [destination](./glossary.md#destination). `dlt` uses configurable,
idempotent, atomic loads that ensure data safely ends up there. For example, you don't need to worry about
the size of the data you are loading and if the process is interrupted, it is safe to retry without creating
errors.
5 changes: 3 additions & 2 deletions docs/website/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ const config = {
onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'warn',
favicon: 'img/favicon.ico',
staticDirectories: ['public', 'static'],

// GitHub pages deployment config.
// If you aren't using GitHub pages, you don't need these.
organizationName: 'facebook', // Usually your GitHub org/user name.
projectName: 'docusaurus', // Usually your repo name.
organizationName: 'dltHub', // Usually your GitHub org/user name.
projectName: 'dlt', // Usually your repo name.

// Even if you don't use internalization, you can use this field to set useful
// metadata like html lang. For example, if your site is Chinese, you may want
Expand Down
Binary file added docs/website/static/img/architecture-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.