This project demostrates how you can setup a Dockerized example/development Apache Druid cluster.
The cluster is being composed of the following components:
-
S3 Compatible Object Storage MinIO for Deep storage
-
PostgreSQL for metadata storage
-
Zookeeper for internal service discovery, coordination, and leader election
-
Apache Druid platform:
- Middle Manager to handle the ingestion of data into the cluster
- Historical to handle the storage and querying on “historical” data
- Broker to receive queries from external clients
- Coordinator to assign segments to Historical nodes
- Overlord to assign ingestion tasks to Middle Managers and to coordinate segment publishing
- Router provides a unified API gateway in front of Brokers, Overlords and Coordinators
make image
or by using docker-compose
docker-compose build
You can also specify the version of Druid to build, for example:
make DRUID_VERSION=0.14.1-incubating image
or by using docker-compose
docker-compose build --build-arg ARG_DRUID_VERSION=0.14.1-incubating
docker-compose up
or to run in the backgroumd:
docker-compose up -d
After a while the Druid console should be available in http://localhost:8888
For example data we are using a subset of the NYC Taxi & Limousine Commission - Trip Record Data, specifically from months 2015-01 to 2015-03.
cd dataset
./03-load_to_druid.sh
Please note that you can download data for different months and adjust the sample size by adjusting the parameters of ./dataset/01-download.sh
and ./dataset/02-create_sample_tripdata.sh
.
The schema of the dataset and the indexing task is being defined in ./dataset/yellow_tripdata-index.json
...enjoy :)