Bulldozer Price Prediction Notebook Update (column order) #62

mrdbourke · 2023-08-25T01:40:37Z

mrdbourke
Aug 25, 2023
Maintainer

Problem Example

As of Scikit-Learn 1.2+, the columns (features) a model has been fit on, should match the columns (features) a model is trying to predict on.

This goes for both names of columns as well as order of columns.

For example, if the training columns are:

X_train.columns = ["col_1", "col_2", "col_3"]

And the testing columns are:

X_test.columns = ["col_1", "col_3", "col_2"]

Running model.fit() on the training data and then model.predict() on the testing data will error.

To fix this, you can change the order of the test columns to match the order of the training columns.

For example, in end-to-end-bluebook-bulldozer-price-regression.ipynb, our training columns go under a fair bit of modification.

Code Fix

To make sure the test columns line up with the training columns, we can run:

# Match column order from X_train to df_test (to predict on columns, they should be in the same order they were fit on)
df_test = df_test[X_train.columns]

This line will make sure the columns of df_test match the order of the columns of X_train.

And then:

# Make predictions on the test dataset using the best model
test_preds = ideal_model.predict(df_test)

--

A big thank you to @arpadikuma for the pull request to update this, see #61.

This code has been added to end-to-end-bluebook-bulldozer-price-regression.ipynb in the section "Preprocessing the test data".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulldozer Price Prediction Notebook Update (column order) #62

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Bulldozer Price Prediction Notebook Update (column order) #62

mrdbourke Aug 25, 2023 Maintainer

Problem Example

Code Fix

Replies: 0 comments

mrdbourke
Aug 25, 2023
Maintainer