Skip to content

Latest commit

 

History

History
464 lines (306 loc) · 10.8 KB

slides.org

File metadata and controls

464 lines (306 loc) · 10.8 KB

Predicting prices of used BMW cars

Source

Source code for this project available on github.

Problem statement

Background

  • Cars are used throughout the world
  • Big resale market (due to cost and durability)
  • Many consumers have no clear idea about car prices
  • Makes navigating the market and negotiating with car dealers difficult

Goal

  • Predict resale prices of cars based on historic data
    • Target variable is continuous
    • Will use R-squared (R^2) metric
    • This should be close to 1
  • Make predictions available to consumers

Data

Source

  • Provided by Datacamp
  • No details about collection known

Features

FeatureDescriptionType
pricePrice in USDnumerical
yearProduction yearnumerical
mileageDistance drivennumerical
taxRoad taxnumerical
mpgMiles per gallonnumerical
engineSizeSize of enginenumerical
FeatureDescriptionType
modelCar modelcategorical
transmissionType of transmissioncategorical
fuelTypeFuel typecategorical

Which features are the most important?

Simple data model

file:figures/data_model1.svg

Full data model

file:figures/data_model3.svg

Exploring the data

Year and mileage

file:figures/price_of_year_mileage.png

Car model

file:figures/price_of_model.svg

Transmission

file:figures/price_of_transmission.svg

Tax, mpg and engine size

file:figures/price_of_tax_mpg_enginesize.png

Fuel type

file:figures/price_of_fueltype.svg

Predictive model

Linear model

file:figures/mileage_fit.png

Feature selection

Last added featureMean R^2 test score
mileage0.543242
year0.643062
model0.885855
engineSize0.918769
transmission0.924562
Last added featureMean R^2 test score
transmission0.924562
fuelType0.925534
mpg0.928286
tax0.928287

Feature selection

  • Include: mileage, year, car model, engine size and transmission.
  • Exclude: fuel type, mpg and tax.

Parameter interpretation

observable10^coef10^coef - 1
year1.10611%
engineSize1.20621%
10000*mileage0.941-6%

Parameter interpretation

Price relative to “Automatic”

transmission10^coef10^coef - 1
Manual0.913-9%
Semi-Auto1.022%

Parameter interpretation

Price relative to “1 Series”

model10^coef10^coef - 1
2 Series1.0273%
3 Series1.1313%
4 Series1.15115%
5 Series1.22823%
6 Series1.30230%

Web interface prototype

Web interface prototype

file:figures/web_page_screenshot.png

Conclusion

  • Built a linear model for predicting resale prices of BMW cars
  • Works fairly well
  • Model coefficients are explainable
  • Demonstrated web interface prototype

Going forward

Follow up with data collection team

  • Suspicious values in mpg, engine size and tax

If more accuracy is required

  • More complex model might help
  • But risk of overfitting and less explainability

Web interface

  • Improve design of web front end
  • Ensure scalability of back end depending on expected usage

Thank you for your attention

Any questions?

Additional background

Metric

R-squared (R^2)

  • A number
  • Measure of how well the model describes the data
  • The closer to one the better

Data

Data model 2

file:figures/data_model2.svg

Predictive model

Additional assumption

  • All car prices fall at the same rate with age and mileage, independent on car model and other factors

Parameter interpretation

Price relative to “1 Series”

model10^coef10^coef - 1
2 Series1.0273%
3 Series1.1313%
4 Series1.15115%
5 Series1.22823%
6 Series1.30230%
7 Series1.54254%
8 Series2.07107%
X11.16216%
X21.20420%
X31.43544%
X41.49249%
X51.76276%
X61.79179%
X72.382138%
M21.48849%
M32.183118%
M41.67267%
M51.75475%
Z41.25926%

90% Prediction interval

  • 90% of car prices expected to be within this interval
  • Indicates model uncertainty

Example:

  • Predicted price (p): $10,000
  • Relative half-width (h): 25%
  • 90% of cars between p/(1+h) and p*(1+h), that is from $8,000 to $12,500

90% Prediction interval with partial data

Last added featureRelative half-width
mileage70%
model41%
year30%
engineSize25%
transmission24%