Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added DropNullColumn transformer to remove columns that contain only nulls #1115
base: main
Are you sure you want to change the base?
Added DropNullColumn transformer to remove columns that contain only nulls #1115
Changes from 23 commits
bee630f
ccc9a02
d3f9c90
9d42b95
f249982
a1caf39
b0e3235
90be825
55764a8
c8fdaaa
0cdc0bd
34c0095
430c8e3
80bd408
e2ca33f
4dbba09
7d6f8ce
4771d18
f0b521a
ea9893b
c73db7e
09cf9c7
4e4f255
754e2ef
acafac6
4b0aa1c
35f8909
b4e419f
75f1110
e499dc1
c296829
ee6b7b5
92210b7
4cad44a
836a636
4ec95d6
3c25b84
8638516
e70f513
6083567
a543044
92f5430
24b18ba
62ef9d6
53cb8bd
7af96ca
11908b3
2499a37
548b792
58feaed
5a6539c
36c46d4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Check warning on line 842 in skrub/_dataframe/_common.py
Codecov / codecov/patch
skrub/_dataframe/_common.py#L842
Check warning on line 844 in skrub/_dataframe/_common.py
Codecov / codecov/patch
skrub/_dataframe/_common.py#L844
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was a leftover from the previous version, I rewrote the comments
Check warning on line 847 in skrub/_dataframe/_common.py
Codecov / codecov/patch
skrub/_dataframe/_common.py#L847
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want it to be
True
by default ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be discussed with others I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for true by default -- there's nothing we can learn from a completely empty column.
if it is False by default, I think it should be set to True in the
tabular_learner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"N/A"
tonull
it will be dropped. also it's not important but your transformer never raises a RejectColumn exception soallow_reject
has no effect you don't need it here and can leave the defaultThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it after CleanNullStrings, but I think I did it in an ugly way, maybe it can be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to write this check so that it works with either pandas or polars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use
df_module
as a fixture in the test by adding it to the arguments, then comparing series instead of numpy arrays:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test would look like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also circumvents that depending on the version of pandas, null values are not treated the same