Source code for this project available on github.
- Cars are used throughout the world
- Big resale market (due to cost and durability)
- Many consumers have no clear idea about car prices
- Makes navigating the market and negotiating with car dealers difficult
- Predict resale prices of cars based on historic data
- Target variable is continuous
- Will use R-squared (R^2) metric
- This should be close to 1
- Make predictions available to consumers
- Provided by Datacamp
- No details about collection known
Feature | Description | Type |
---|---|---|
price | Price in USD | numerical |
year | Production year | numerical |
mileage | Distance driven | numerical |
tax | Road tax | numerical |
mpg | Miles per gallon | numerical |
engineSize | Size of engine | numerical |
Feature | Description | Type |
---|---|---|
model | Car model | categorical |
transmission | Type of transmission | categorical |
fuelType | Fuel type | categorical |
file:figures/data_model1.svg
file:figures/data_model3.svg
file:figures/price_of_year_mileage.png
file:figures/price_of_model.svg
file:figures/price_of_transmission.svg
file:figures/price_of_tax_mpg_enginesize.png
file:figures/price_of_fueltype.svg
file:figures/mileage_fit.png
Last added feature | Mean R^2 test score |
---|---|
mileage | 0.543242 |
year | 0.643062 |
model | 0.885855 |
engineSize | 0.918769 |
transmission | 0.924562 |
… | … |
Last added feature | Mean R^2 test score |
---|---|
… | … |
transmission | 0.924562 |
fuelType | 0.925534 |
mpg | 0.928286 |
tax | 0.928287 |
- Include: mileage, year, car model, engine size and transmission.
- Exclude: fuel type, mpg and tax.
observable | 10^coef | 10^coef - 1 |
---|---|---|
year | 1.106 | 11% |
engineSize | 1.206 | 21% |
10000*mileage | 0.941 | -6% |
Price relative to “Automatic”
transmission | 10^coef | 10^coef - 1 |
---|---|---|
Manual | 0.913 | -9% |
Semi-Auto | 1.02 | 2% |
Price relative to “1 Series”
model | 10^coef | 10^coef - 1 |
---|---|---|
2 Series | 1.027 | 3% |
3 Series | 1.13 | 13% |
4 Series | 1.151 | 15% |
5 Series | 1.228 | 23% |
6 Series | 1.302 | 30% |
… | … | … |
file:figures/web_page_screenshot.png
- Built a linear model for predicting resale prices of BMW cars
- Works fairly well
- Model coefficients are explainable
- Demonstrated web interface prototype
Follow up with data collection team
- Suspicious values in mpg, engine size and tax
If more accuracy is required
- More complex model might help
- But risk of overfitting and less explainability
Web interface
- Improve design of web front end
- Ensure scalability of back end depending on expected usage
Any questions?
- A number
- Measure of how well the model describes the data
- The closer to one the better
file:figures/data_model2.svg
- All car prices fall at the same rate with age and mileage, independent on car model and other factors
Price relative to “1 Series”
model | 10^coef | 10^coef - 1 |
---|---|---|
2 Series | 1.027 | 3% |
3 Series | 1.13 | 13% |
4 Series | 1.151 | 15% |
5 Series | 1.228 | 23% |
6 Series | 1.302 | 30% |
7 Series | 1.542 | 54% |
… | … | … |
… | … | … |
8 Series | 2.07 | 107% |
X1 | 1.162 | 16% |
X2 | 1.204 | 20% |
X3 | 1.435 | 44% |
X4 | 1.492 | 49% |
X5 | 1.762 | 76% |
X6 | 1.791 | 79% |
… | … | … |
… | … | … |
X7 | 2.382 | 138% |
M2 | 1.488 | 49% |
M3 | 2.183 | 118% |
M4 | 1.672 | 67% |
M5 | 1.754 | 75% |
Z4 | 1.259 | 26% |
- 90% of car prices expected to be within this interval
- Indicates model uncertainty
Example:
- Predicted price (p): $10,000
- Relative half-width (h): 25%
- 90% of cars between p/(1+h) and p*(1+h), that is from $8,000 to $12,500
Last added feature | Relative half-width |
---|---|
mileage | 70% |
model | 41% |
year | 30% |
engineSize | 25% |
transmission | 24% |