Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not edit "N/A", "NA", and similar entries **by default**. #1077

Closed
sadish-d opened this issue Mar 26, 2023 · 3 comments
Closed

Do not edit "N/A", "NA", and similar entries **by default**. #1077

sadish-d opened this issue Mar 26, 2023 · 3 comments

Comments

@sadish-d
Copy link

I just wanted to preemptively request that DataFrames.jl not change "N/A", "NA", and similar entries to missing or NaN by default as Python's Pandas currently does. In Pandas, if I import a csv file and export it with default settings, I do not necessarily get the same thing (it exports the index, edits NAs, ...). It is very annoying.

@bkamins bkamins transferred this issue from JuliaData/DataFrames.jl Mar 26, 2023
@bkamins
Copy link
Member

bkamins commented Mar 26, 2023

This is issue related to CSV.jl not DataFrames.jl so I transferred it.

The thing you request is handled by missingstring keyword argument. You can read about it here https://csv.juliadata.org/stable/reading.html#missingstring.

If this is clear and enough for you I propose to close this issue.

@sadish-d
Copy link
Author

To be clear, I did NOT want anything to change in DataFrames.jl. In fact I was preemptively request you to not change the default behavior. (Not sure what's a good way to do that? Maybe I should have posted in discourse instead.) DataFrame.jl's default behavior right now is missingstring="". That is how it should be. In Pandas, the default behavior is the equivalent of missingstring=["NA", "NAN", "NULL"], which is very annoying.

I am not sure right now if importing and then exporting a csv using CSV.jl changes the data. If it does, then yes it would be an issue worth addressing.

I will close this issue.

(Also, I'm new to Julia and it's been so great to see your discussions, videos, and your work generally @bkamins . Thank you.)

@sadish-d
Copy link
Author

Tested this csv file with CSV.jl and it seems to work as I would expect. It takes "" as missing or when there are two commas one after another. "NA" or NA and similar values get treated as strings.

string,value1,value2,value3,value4,value5,value6,mis1,mis2,num
a,1,1,1,1,1,1,,"",1.1
b,NA,"NA",N/A,"N/A",NULL,"NULL",,"",1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants