In this project we construct a streaming event pipeline around Apache Kafka and its ecosystem. Using public dataset from Chicago Transit Authority we constructed an event pipeline around Kakfa that allows to simulate and display status of train in real time.
Arrival and Turnstiles -> Producers that create train arrival and turnstile information into our kafka cluster. Arrivals indicate that a train has arrived at a particular station while the turnstile event indicate a passanger has entered the station.
Weather -> A REST Proxy prodcer that periodically emits weather data by a REST Proxy and emits that to the kafka cluster.
Postgres SQL and Kafka Connect -> Extract data from stations and push it to kafka cluster.
Kafka status server -> Consumes data from kafka topics and display on the UI.
- Docker (I used bitnami kafka image available here
- Python 3.7
First make sure all the service are up and running: For docker use:
docker-compose up
Docker-Compose will take 3-5 minutes to start, depending on your hardware. Once Docker-Compose is ready, make sure the services are running by connecting to them using DOCKER URL provided below:
Also, you need to install requirements as well, use below command to create a virtual environment and install requirements:
cd producers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
Same for the consumers, setup environment as below:
cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
Run the producers using simulation.py in producers folder:
python simulation.py
Run the Faust Stream Processing Application:
cd consumers
faust -A faust_stream worker -l info
Run KSQL consumer as below:
cd consumers
python ksql.py
To run consumer server:
cd consumers
python server.py
Confluent Python Client Documentation
Confluent Python Client Usage and Examples
REST Proxy API Reference
Kafka Connect JDBC Source Connector Configuration Options