feat: Enable parse of `0`/`1` as bool in csv #18504

mcrumiller · 2024-09-01T16:02:32Z

codecov · 2024-09-01T16:34:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.86%. Comparing base (4dc90a9) to head (97c5484).

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #18504      +/-   ##
==========================================
- Coverage   79.87%   79.86%   -0.02%     
==========================================
  Files        1501     1501              
  Lines      202032   202032              
  Branches     2868     2868              
==========================================
- Hits       161370   161349      -21     
- Misses      40115    40136      +21     
  Partials      547      547

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ritchie46 · 2024-09-02T07:20:06Z

This adds extra branches to the csv parser, making it slower for the default cases. This should be added in the casting logic instead.

mcrumiller · 2024-09-02T21:06:04Z

This adds extra branches to the csv parser, making it slower for the default cases. This should be added in the casting logic instead.

@ritchie46 I'm not sure how to deal with this. If the user supplies the schema via schema override then the field is specified as boolean, and the boolean parser gets used. We could ignore booleans in the schema override and make sure we cast back later, but that feels messy. Did you have another solution in mind?

jakob-keller · 2024-09-03T13:22:08Z

Would it be possible to make this even more flexible? I am dealing with boolean data in CSVs that is coded as Y and N.

Maybe the dtype could be specified as pl.Boolean(false="N", true="Y"), if that makes any sense at all.

mcrumiller · 2024-09-03T13:33:32Z

@jakob-keller can you not just do something like:

pl.scan_csv("file.csv").with_columns(
    col("bool_col").replace_strict({"Y": True, "N": False})
)

jakob-keller · 2024-09-03T13:38:05Z

@jakob-keller can you not just do something like:

pl.scan_csv("file.csv").with_columns(
    col("bool_col").replace_strict({"Y": True, "N": False})
)

Sure, that's what I am doing right now. And the same goes for 0/1, I guess.

Allow parse 0/1 as bool

1358c00

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Sep 1, 2024

Format

97c5484

mcrumiller marked this pull request as ready for review September 1, 2024 16:40

mcrumiller requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa and orlp as code owners September 1, 2024 16:40

mcrumiller marked this pull request as draft September 2, 2024 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable parse of `0`/`1` as bool in csv #18504

feat: Enable parse of `0`/`1` as bool in csv #18504

mcrumiller commented Sep 1, 2024

codecov bot commented Sep 1, 2024

ritchie46 commented Sep 2, 2024

mcrumiller commented Sep 2, 2024

jakob-keller commented Sep 3, 2024 •

edited

Loading

mcrumiller commented Sep 3, 2024

jakob-keller commented Sep 3, 2024

feat: Enable parse of 0/1 as bool in csv #18504

Are you sure you want to change the base?

feat: Enable parse of 0/1 as bool in csv #18504

Conversation

mcrumiller commented Sep 1, 2024

codecov bot commented Sep 1, 2024

Codecov Report

ritchie46 commented Sep 2, 2024

mcrumiller commented Sep 2, 2024

jakob-keller commented Sep 3, 2024 • edited Loading

mcrumiller commented Sep 3, 2024

jakob-keller commented Sep 3, 2024

feat: Enable parse of `0`/`1` as bool in csv #18504

feat: Enable parse of `0`/`1` as bool in csv #18504

jakob-keller commented Sep 3, 2024 •

edited

Loading