Skip to content

A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.

License

Notifications You must be signed in to change notification settings

san089/Optimizing-Public-Transportation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimizing-Public-Transportation

Architecture

Architecture

Overview

In this project we construct a streaming event pipeline around Apache Kafka and its ecosystem. Using public dataset from Chicago Transit Authority we constructed an event pipeline around Kakfa that allows to simulate and display status of train in real time.

Arrival and Turnstiles -> Producers that create train arrival and turnstile information into our kafka cluster. Arrivals indicate that a train has arrived at a particular station while the turnstile event indicate a passanger has entered the station.

Weather -> A REST Proxy prodcer that periodically emits weather data by a REST Proxy and emits that to the kafka cluster.

Postgres SQL and Kafka Connect -> Extract data from stations and push it to kafka cluster.

Kafka status server -> Consumes data from kafka topics and display on the UI.

Results

Environment

  • Docker (I used bitnami kafka image available here
  • Python 3.7

Running and Testing

First make sure all the service are up and running: For docker use:

docker-compose up

Docker-Compose will take 3-5 minutes to start, depending on your hardware. Once Docker-Compose is ready, make sure the services are running by connecting to them using DOCKER URL provided below:

Also, you need to install requirements as well, use below command to create a virtual environment and install requirements:

  1. cd producers
  2. virtualenv venv
  3. . venv/bin/activate
  4. pip install -r requirements.txt

Same for the consumers, setup environment as below:

  1. cd consumers
  2. virtualenv venv
  3. . venv/bin/activate
  4. pip install -r requirements.txt

Running Simulation

Run the producers using simulation.py in producers folder:

python simulation.py

Run the Faust Stream Processing Application:

cd consumers
faust -A faust_stream worker -l info

Run KSQL consumer as below:

cd consumers
python ksql.py

To run consumer server:

cd consumers
python server.py

Resources

Confluent Python Client Documentation
Confluent Python Client Usage and Examples
REST Proxy API Reference
Kafka Connect JDBC Source Connector Configuration Options