fix regression with set_output in scikit-learn < 1.4 #1122
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OnEachColumn, when its transformer is a scikit-learn transformer exposing set_output, calls set_output('pandas') or set_output('polars') to get the output in the correct dataframe.
in old scikit-learn versions, set_output('polars') is not supported, so OnEachColumn does set_output('pandas') and later does the conversion itself with
polars.from_pandas
.or at least it's supposed to -- when doing a small change in #973 to handle possible exceptions raised by set_output I forgot to copy the "else" part of the "if scikit-learn < 1.4" block so now it is always using "polars" even in scikit-learn versions that don't support it.
this pr fixes the regression