Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: drop_nulls with subset of length>1 #1090

Merged
merged 2 commits into from
Sep 28, 2024
Merged

Conversation

FBruzzesi
Copy link
Member

@FBruzzesi FBruzzesi commented Sep 28, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

This might not be the permanent fix to the issue, consider it as a hotfix for now

@FBruzzesi FBruzzesi changed the title fix: DataFrame.drop_nulls with subset of length>1 fix: drop_nulls with subset of length>1 Sep 28, 2024
@github-actions github-actions bot added the fix label Sep 28, 2024
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for spotting this

i think the issue is just that any_horizontal should always only just return a single column, so we could do

             root_names=combine_root_names(parsed_exprs),
-            output_names=parsed_exprs[0]._output_names,
+            output_names=parsed_exprs[0]._output_names[:1],
         )
 
     def any_horizontal(self, *exprs: IntoPandasLikeExpr) -> PandasLikeExpr:
@@ -262,7 +262,7 @@ class PandasLikeNamespace:
             depth=max(x._depth for x in parsed_exprs) + 1,
             function_name="any_horizontal",
             root_names=combine_root_names(parsed_exprs),
-            output_names=parsed_exprs[0]._output_names,
+            output_names=parsed_exprs[0]._output_names[:1],
         )

and that should fix it without needing to do other changes

@FBruzzesi
Copy link
Member Author

parsed_exprs[0]._output_names can easily be None though.

Do you like:

- output_names=parsed_exprs[0]._output_names,
+ output_names=(parsed_exprs[0]._output_names or [None])[:1],

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Sep 28, 2024

ah you're right - let's make a function then, just like we have for combine_root_names one line above? in there we can explain that we follow the right-hand-rule like Polars does

@FBruzzesi
Copy link
Member Author

right-hand-rule like Polars does

left-most?

let's make a function then

Not sure I follow 100%. combine_root_names seems generic-ish.

For this (horizontal) reduction I would expect to have:

  • left-most output name if available
  • otherwise? left most root name?

@MarcoGorelli
Copy link
Member

yup, left-most, sorry :)

so parsed_exprs[0]._output_names[:1] if parsed_exprs[0]._output_names is not None else None

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this

thanks @FBruzzesi !

@MarcoGorelli MarcoGorelli merged commit 3015eff into main Sep 28, 2024
25 checks passed
@FBruzzesi FBruzzesi deleted the fix/drop-nulls-hotfix branch September 28, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Run into safety assertion on reuse_series_implementation for drop_nulls
2 participants