Project Lagoon

Project Lagoon is a data productization system, designed to simplify and isolate the process of preparing and delivering large datasets to external applications.

Description

Project Lagoon aims to create a reusable, automated system for transforming raw application data into high-quality assets.

This project leverages a micro lake architecture, with automation, wiring, and abstraction to create a cost-effective, modular solution. Designed to be deployable to any AWS account, a Lagoon includes a data lake (Iceberg), automated ingestion (Spark), data pipelines (dbt), and orchestration (Dagster).

Just drop raw data files in your S3 bucket and create your dbt models. Create high quality, production ready datasets in minutes.

Modules

deploy: Automated deployment of pipeline changes to orchestration system.
ingest: Spark automation to upsert raw data (JSON, Avro, Parquet, ORC, CSV, XML) into Iceberg tables.
initialize: CLI tool for deploying and updating your Lagoon
orchestr: dbt data pipeline orchestration and execution.

Get Started

To run and use Project Lagoon, download the latest release binary for your platform and execute the binary.

./<binary> --profile <AWS PROFILE>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Project Lagoon

Description

Modules

Get Started

Files

README.md

Latest commit

History

README.md

File metadata and controls

Project Lagoon

Description

Modules

Get Started