-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable parse of 0
/1
as bool in csv
#18504
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #18504 +/- ##
==========================================
- Coverage 79.87% 79.86% -0.02%
==========================================
Files 1501 1501
Lines 202032 202032
Branches 2868 2868
==========================================
- Hits 161370 161349 -21
- Misses 40115 40136 +21
Partials 547 547 ☔ View full report in Codecov by Sentry. |
This adds extra branches to the csv parser, making it slower for the default cases. This should be added in the casting logic instead. |
@ritchie46 I'm not sure how to deal with this. If the user supplies the schema via schema override then the field is specified as boolean, and the boolean parser gets used. We could ignore booleans in the schema override and make sure we cast back later, but that feels messy. Did you have another solution in mind? |
Would it be possible to make this even more flexible? I am dealing with boolean data in CSVs that is coded as Y and N. Maybe the dtype could be specified as |
@jakob-keller can you not just do something like: pl.scan_csv("file.csv").with_columns(
col("bool_col").replace_strict({"Y": True, "N": False})
) |
Sure, that's what I am doing right now. And the same goes for 0/1, I guess. |
#18502