Skip to content

Latest commit

 

History

History
34 lines (24 loc) · 1.43 KB

README.md

File metadata and controls

34 lines (24 loc) · 1.43 KB

Project Lagoon

Project Lagoon is a data productization system, designed to simplify and isolate the process of preparing and delivering large datasets to external applications.

Description

Project Lagoon aims to create a reusable, automated system for transforming raw application data into high-quality assets.

This project leverages a micro lake architecture, with automation, wiring, and abstraction to create a cost-effective, modular solution. Designed to be deployable to any AWS account, a Lagoon includes a data lake (Iceberg), automated ingestion (Spark), data pipelines (dbt), and orchestration (Dagster).

Just drop raw data files in your S3 bucket and create your dbt models. Create high quality, production ready datasets in minutes.

Modules

  • deploy: Automated deployment of pipeline changes to orchestration system.
  • ingest: Spark automation to upsert raw data (JSON, Avro, Parquet, ORC, CSV, XML) into Iceberg tables.
  • initialize: CLI tool for deploying and updating your Lagoon
  • orchestr: dbt data pipeline orchestration and execution.

Get Started

To run and use Project Lagoon, download the latest release binary for your platform and execute the binary.

./<binary> --profile <AWS PROFILE>