taxi-trips-predictions

Description

The task is to create a machine learning model that predicts the count of taxi trips for the next hour in Chicago's community areas. The utilization of this model could help to optimize taxi driver workload distribution across different locations. This is a time series task.

Datasets with their respective descriptions can be found by the following links: https://data.cityofchicago.org/Transportation/Taxi-Trips-2022/npd7-ywjz
https://data.cityofchicago.org/Transportation/Taxi-Trips-2023/e55j-2ewb

This repository contains:

A notebook with machine learning model predicting the count of taxi trips.
Corresponding code for time series feature generation from pandas dataframe.
A shell script to create the cluster of Docker containers to run the PySpark code.

Setup (from command line)

To run the notebook, you need Docker installed. When done, run:

$ sh start_local_cluster.sh

Access the local cluster by the address you get in the terminal window. Add the SPARK_MASTER_IP variable that you get in the terminal to the .ipynb file, cell 8.

Tools used

Docker
PySpark
Pandas
LightGBM
Catboost

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lib		lib
.gitignore		.gitignore
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt
setup.cfg		setup.cfg
start_local_cluster.sh		start_local_cluster.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

taxi-trips-predictions

Description

Setup (from command line)

Tools used

About

Releases

Packages

Languages

tmvfb/taxi-trips-predictions

Folders and files

Latest commit

History

Repository files navigation

taxi-trips-predictions

Description

Setup (from command line)

Tools used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages