Skip to content

Commit

Permalink
Merge pull request #84 from scale-vector/architecture
Browse files Browse the repository at this point in the history
first draft of digram + extract, normalize, load description
  • Loading branch information
TyDunn authored Nov 24, 2022
2 parents 6e00834 + 792a482 commit 2987299
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 3 deletions.
23 changes: 22 additions & 1 deletion docs/website/docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,29 @@ sidebar_position: 4

# Architecture

`dlt` automatically turns JSON returned by any [source](./glossary.md#source) (e.g. an API)
into a live dataset stored in the [destination](./glossary.md#destination) of your choice
(e.g. Google BigQuery). It does this by first [extracting](./architecture.md#extract) the JSON data,
then [normalizing](./architecture.md#normalize) it to a schema, and finally [loading](./architecture#load)
it to the location where you will store it.

![architecture-diagram](/img/architecture-diagram.png)

## Extract

The Python script requests data from an API or a similar [source](./glossary.md#source). Once this data
is recieved, the script parses the JSON and provides it to `dlt` as input, which then normalizes that data.

## Normalize

## Load
The configurable normalization engine in `dlt` recursively unpacks this nested structure into
relational tables (i.e. inferring data types, linking tables to create parent-child relationships,
etc.), making it ready to be loaded. This creates a [schema](./glossary.md#schema), which will
automatically evolve to any future source data changes (e.g. new fields or tables).

## Load

The data is then loaded into your chosen [destination](./glossary.md#destination). `dlt` uses configurable,
idempotent, atomic loads that ensure data safely ends up there. For example, you don't need to worry about
the size of the data you are loading and if the process is interrupted, it is safe to retry without creating
errors.
5 changes: 3 additions & 2 deletions docs/website/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ const config = {
onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'warn',
favicon: 'img/favicon.ico',
staticDirectories: ['public', 'static'],

// GitHub pages deployment config.
// If you aren't using GitHub pages, you don't need these.
organizationName: 'facebook', // Usually your GitHub org/user name.
projectName: 'docusaurus', // Usually your repo name.
organizationName: 'dltHub', // Usually your GitHub org/user name.
projectName: 'dlt', // Usually your repo name.

// Even if you don't use internalization, you can use this field to set useful
// metadata like html lang. For example, if your site is Chinese, you may want
Expand Down
Binary file added docs/website/static/img/architecture-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2987299

Please sign in to comment.