Skip to content
/ lagoon Public

Create consumption-ready data. Aggregate raw application data in an Iceberg data lake, transform with dbt, and enforce compliance requirements. Use with mytiki.com to monetize your new data assets.

License

Notifications You must be signed in to change notification settings

mytiki/lagoon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Project Lagoon

Project Lagoon is a data productization system, designed to simplify and isolate the process of preparing and delivering large datasets to external applications.

Description

Project Lagoon aims to create a reusable, automated system for transforming raw application data into high-quality assets.

This project leverages a micro lake architecture, with automation, wiring, and abstraction to create a cost-effective, modular solution. Designed to be deployable to any AWS account, a Lagoon includes a data lake (Iceberg), automated ingestion (Spark), data pipelines (dbt), and orchestration (Dagster).

Just drop raw data files in your S3 bucket and create your dbt models. Create high quality, production ready datasets in minutes.

Modules

  • deploy: Automated deployment of pipeline changes to orchestration system.
  • ingest: Spark automation to upsert raw data (JSON, Avro, Parquet, ORC, CSV, XML) into Iceberg tables.
  • initialize: CLI tool for deploying and updating your Lagoon
  • orchestr: dbt data pipeline orchestration and execution.

Get Started

To run and use Project Lagoon, download the latest release binary for your platform and execute the binary.

./<binary> --profile <AWS PROFILE>

About

Create consumption-ready data. Aggregate raw application data in an Iceberg data lake, transform with dbt, and enforce compliance requirements. Use with mytiki.com to monetize your new data assets.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published