Skip to content

SavOK/YelpTime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sales Time

Insight Data Engineering Project

The idea is to provide recommendations to door-to-door sales which potential customer to visit next. Given the stringent time-limit, the sales person would want to know the customer fastest to reach. The app suggests next location to visit based on mode of transportation and real-time road conditions

Table of Contents

Data Flow
Data Source
Approach

Installation and Usage

Data Flow

Data Flow

Data Source

List of Licensed business comes from Data.gov

Approach

  • Process data with PySpark (clean and normalize)
  • Store data in Postgres
  • Use PostGIST to index spacial data (location)
  • Dash UI to interact with data

Cleaning Data

  • Data saved at S3 as 51 csv files
  • Removed row without location coordinates, address, or industry description (NAICS number)
  • If provided with number of employees, broke number in bins (0-10, 10-100, 100-500, ...)
  • If provided with sales value, broke number in bins (0-1000, 1000-10000, 100000-500000, ...)

Storing Data

  • Cleaned and normalized data stored Database schema

Calculating Route Time

  • Here API

UI Dash App

  • options State
  • options Business type
  • Transportation mode
  • Time radius
  • Starting location

Installation and Usage

IMPORTANT Before doing anything need to setup config.py

Requirements

boto3, dash, GeoAlchemy2, gunicorn, pandas, psycopg2-binary, requests, SQLAlchemy

Project tree

./
├── LICENSE
├── README.md
├── config.py
├── data
│ └── 6-digit_2017_Codes.csv
└── src
 ├── APIs
 │   ├── HereAPI.py
 │   └── YelpAPI.py
 ├── AirFlow
 │   └── UpdateDataSchedule.py
 ├── SQL
 │   ├── AssociationTables.py
 │   ├── BusinessTable.py
 │   ├── CategoryTable.py
 │   ├── LocationTable.py
 │   ├── MainTable.py
 │   ├── __init__.py
 │   └── base.py
 ├── SQLScripts
 │  └── create_index.sql
 ├── assets
 │   ├── base.css
 │   └── style.css
 ├── config.py
 ├── help_functions_app.py
 ├── main_app.py
 └── pyspark_clean_data.py

About

Insight Data Engineering Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published