Skip to content

data pipeline for blockchain data

License

Notifications You must be signed in to change notification settings

bachkaxyz/bread

Repository files navigation

🍞 BREAD: Blockchain Read, Extract, Analyze, Display 🍞

BREAD is a data pipeline for blockchain data. Currently built for tendermint-based blockchains, it reads raw data from nodes, parses it into an easily consumable format, loaded into a database, and then can be served to a frontend or API (both coming soon).

Core Components

  • extract.py: A script that reads tx and block data.
  • parse.py: A script to parse the JSON data into a relational format and save as Parquet files.
  • DuckDB: An in-memory analytical database that is used as the engine for data processing and analysis.
  • dbt: Data Build Tool (dbt) is used for transforming the data models, making it easier to understand and analyze the data.
  • Prefect: Orchestration tool that is used to schedule and manage the data pipeline.
  • duckdbt: A package that enables DuckDB to run in a concurrent manner via a PostgreSQL proxy server called Buena Vista.

Getting Started

  1. Clone the Repository: Clone this repository to your local machine.
  2. Load Submodules: If the duckdbt folder is empty, cd into your local repository and run git submodule update --init to load the submodules.
  3. Set up Environment Variables: Set up the necessary environment variables in a .env file. This includes the network for the blockchain data.
  4. Build and Run the Docker Container: Use the provided Makefile command, make up, to build and run the Docker container.
  5. Do Things: run make bash to enter the container. You can also access a query interface at http://localhost:8080/#.
  6. Get Data or Run Pipeline: make pipeline to pull, parse, and ingest data in duckdb.