Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse 0 and 1 as Bool if asked for explicitly #760

Closed
kescobo opened this issue Oct 28, 2020 · 3 comments · Fixed by #764
Closed

Parse 0 and 1 as Bool if asked for explicitly #760

kescobo opened this issue Oct 28, 2020 · 3 comments · Fixed by #764

Comments

@kescobo
Copy link
Contributor

kescobo commented Oct 28, 2020

On julia master, CSV#master:

shell> cat test.csv
a,b
1.0,0
2.0,1
julia> CSV.File("test.csv", types=[Float64, Bool])
┌ Warning: thread = 1 warning: error parsing Bool around row = 2, col = 2: "0
│ ", error=INVALID: SENTINEL | NEWLINE | INVALID_DELIMITER 
└ @ CSV ~/.julia/packages/CSV/WixoJ/src/file.jl:630
┌ Warning: thread = 1 warning: error parsing Bool around row = 3, col = 2: "1", error=INVALID: SENTINEL | EOF | INVALID_DELIMITER 
└ @ CSV ~/.julia/packages/CSV/WixoJ/src/file.jl:630
2-element CSV.File{false}:
 CSV.Row: (a = 1.0, b = missing)
 CSV.Row: (a = 2.0, b = missing)

julia> versioninfo()
Julia Version 1.6.0-DEV.1342
Commit 6312bfec43 (2020-10-27 14:55 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)

Given that

julia> Bool(1)
true

julia> Bool(0)
false

It seems like this should work.


This is perhaps a separate issue (or just me misreading the docstring), but I thought perhaps typemap would work in this case, even though it's not generalizable (since a file might have legit Int and Bool columns), but I got this:

julia> CSV.File("test.csv", typemap=Dict(Int=>Bool))
2-element CSV.File{false}:
 CSV.Row: (a = 1.0, b = "0")
 CSV.Row: (a = 2.0, b = "1")
@quinnj
Copy link
Member

quinnj commented Oct 28, 2020

You can do this like CSV.File(file; truestrings=["1"], falsestrings=["0"])

@kescobo
Copy link
Contributor Author

kescobo commented Oct 28, 2020

Seems to also require specifying the column type:

shell> cat test.csv
a,b
1,0
2,1

julia> f1 = CSV.File("test.csv"; truestrings=["1"], falsestrings=["0"])
2-element CSV.File{false}:
 CSV.Row: (a = 1, b = 0)
 CSV.Row: (a = 2, b = 1)

julia> first(f1).b |> typeof
Int64

julia> f2 = CSV.File("test.csv"; types=[Int,Bool], truestrings=["1"], falsestrings=["0"])
2-element CSV.File{false}:
 CSV.Row: (a = 1, b = false)
 CSV.Row: (a = 2, b = true)

Which I guess is good, since it prevents the 1 in the first column from being parsed as true, but is there a reason that this can't be a default for Bool columns (if the column is explicitly declared Bool, I don't think a column with only 0 and 1 should be parsed as boolean implicitly)?

@quinnj
Copy link
Member

quinnj commented Oct 29, 2020

Yeah, it seems like just adding "0" and "1" to the default truestrings/falsestrings would solve this; mind making a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants