Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert mixed column types to string #623

Merged
merged 9 commits into from
Jul 17, 2023

Conversation

LeoGrin
Copy link
Contributor

@LeoGrin LeoGrin commented Jun 30, 2023

Fix #622, which was due to mixed types column.

@LeoGrin LeoGrin changed the title convert mixed types to string convert mixed column types to string Jun 30, 2023
Copy link
Member

@jovan-stojanovic jovan-stojanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


def test_mixed_types():
# TODO: datetime/str mixed types
# don't work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it seems to require a bit more work, and I was wondering whether such mixed types could really appear in the wild

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jul 17, 2023 via email

@@ -477,6 +478,11 @@ def _auto_cast(self, X: pd.DataFrame) -> pd.DataFrame:
X[col] = X[col].astype(np.float64)
X[col].fillna(value=np.nan, inplace=True)

# if object, first convert to string to avoid mixed types
for col in X.columns:
if pd.api.types.is_object_dtype(X[col]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary, or can we just look at X.dtypes (at least in the case that X is a dataframe)? I suspect that X.dtypes is less costly than is_object_dtypes. It's also more of a first-level API and thus I suspect that it is more likely to generalize across dataframe implementations.

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merging

@GaelVaroquaux GaelVaroquaux merged commit 1d910c7 into skrub-data:main Jul 17, 2023
17 checks passed
LeoGrin added a commit that referenced this pull request Jul 20, 2023
(Sorry I think I messed up when applying a suggestion for #623 )
LeoGrin added a commit that referenced this pull request Jul 20, 2023
(Sorry I think I messed up when applying a suggestion for #623 )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DatetimeEncoder fail on road_safety dataset
4 participants