This analysis of Chicago's Divvy bikeshare program measures the availability of "unlimited ride" classic bikes as well as availability of open docks.
Divvy promises paid annual members (@ $131/year) "unlimited 45-minute rides on classic bikes." But when these classic (i.e. non-electric) bikes aren't available, riders have to pay $0.17 per minute to use an electric bike or else look for a classic bike at an adjacent station.
At the end of each ride, classic bike riders also rely on the availability of open docks at their intended destination station. When no docks are available, riders have to find an adjacent station further from their intended destination. This may also cause the rider to exceed the 45-minute ride limit, incurring $0.17/minute charges.
This repo acquires station status every hour, looking for the following problems at each timepoint:
- no classic bikes available
- no open docks available
The initial analysis looks at hourly station status on November 17, 2023. To set the stage for longer-term analysis, I've also been requesting and saving hourly data since 11/26/24 to date (currently 1/7/24).
My key findings thus far are as follows. Of 709 classic Divvy stations:
- 534 stations (75%) of stations had one or more classic bikes at all times.
- 121 stations (17%) had no classic bikes at least 15% of the time.
- 61 stations (9%) had no classic bikes at least 30% of the time.
To look at the geographical distribution of problematic stations, I mapped the results in Flourish.
Stations with frequent shortages of classic bikes appear to be concentrated in the Loop, the Near West Side, and the Near North Side.
Data Source | Description |
---|---|
List of JSON Feeds | List of all Divvy System Feeds |
Station Types | Station Type (classic, lightweight) for all stations, as requested by FOIA S061504-112023 submitted to the Chicago Department of Transportation |
Vehicle Type Lookup | Lookup table identifying vehicle_type_id 1=classic, 2=electric, 3=scooter |
Station Info | Latitude and Longitude for each station |
Station Status | # of bikes of each type available at time of API request |
Chicago Community Geographic Boundaries | shapefile for Chicago community areas |
Extract- a GitHub workflow initiates an API request every hour using the script api-request.py
Transform- api-request.py saves most fields as-is, but also extracts the following nested fields from the JSON dictionary prior to saving data:
- timestamp = last_reported time for each station, converted to hours/minutes in Central Time
- n_classic = # of classic bikes (nested within vehicle_types_available dictionary, with vehicle_type_id = 1)
- n_electric = # of electric bikes (vehicle_type_id = 2)
- n_scooters = # of scooters (vehicle_type_id = 3)
I used the following Jupyter Notebooks for analyzing data:
02-prepare-dataset-for-analysis.ipynb- this accomplishes the following:
- merges station info (community area, GPS location, station type) into station status data to create one analytic dataset
- calculates key metrics: is_no_classic and is_no_docks
- removes all "public racks" (i.e. non-stations)
- combines hourly data into one file
Solved
- I originally tried to request data every 15 minutes, but GitHub seemed to be unexpectedly throttling my requests. I scaled back to requesting data every hour.
- Initial analysis suggested a prevalence of docks with 0% classic bike availability on the northwest and southwest sides. Visiting some of these stations, I realized that most of these "problematic" stations were actually lightweight stations with no docks, only a post for tying up electric bikes. Information about station type is not provided in any of Divvy's data, but I was able to get this info via a FOIA request.
Because the City's open data portal has no station status data newer than 9/10/22, I've been requesting data via API and saving it in this repo.
- Review the city's contracts with Lyft and their subcontracted operator, to set the stage for comparing outcomes vs. promised contractual metrics. New York City's comptroller report (November 2023) seems to provide an excellent template for bikeshare accountability.
- Analyze data over a longer time period to assess enduring trends. While it should be quick and easy to merge additional datasets, analysis would be complicated by the facts that Divvy scales back its fleet and eliminates seasonal staff over the winter (from December 1).
- Incorporate metrics on total bike availability to show how often riders have no options (i.e. no electric bikes AND no classic bikes) vs. no affordable options.
- Summarize bike availability by community area.
- Data pipeline improvements
- Automate data transformations and calculations to facilitate long-term analysis. This should only be done once the process and accountability metrics are firmly established.
- Minor adjustments to timestamp tracking- 1) revise timestamp in filename suffix to 24-hour format (i.e. 13:00 vs. 1 PM) to facilitate sorting in visual interfaces, 2) record timestamp as time_reported, and determine time_retrieved during request
This repository is organized into the following folders:
- data- static station info (GPS, station type), as well as hourly station status requested by API
- notebooks- Python code for data exploration, development of data acquisition script, and data transformation
- results- Data analysis summaries
- scripts- includes api-request.py, the script run each hour to extract data from Divvy's API and transform it for analysis