Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

na.strings="0" is not permitted in fread #2927

Closed
msgoussi opened this issue Jun 11, 2018 · 12 comments · Fixed by #5196
Closed

na.strings="0" is not permitted in fread #2927

msgoussi opened this issue Jun 11, 2018 · 12 comments · Fixed by #5196
Assignees
Labels
Milestone

Comments

@msgoussi
Copy link

When I use fread, I get this error (NAstring <<0>> is recognized as type boolean, this is not permitted)
na.strings = c("", "-", "_", "..", "...", "--", "**", "" ,
"n/a", "n.a.", "#VALUE!", "0", "Inf", "-Inf", "NAN", "r", "e")

If I removed "0" from na.strings, fread is not getting error.

However, cells that contains "r" or "e", is not converted to NA, and their columns are characters.

Please advise.

@MichaelChirico
Copy link
Member

Please include your data.table package version & system info, and the file itself, if you can.

@HughParsonage
Copy link
Member

data.table::fread("a,b,c\n0,1,2", na.strings = "0", verbose = TRUE)
#> Input contains a \n or is "". Taking this to be text input (not a filename)
#> [01] Check arguments
#>   Using 12 threads (omp_get_max_threads()=12, nth=12)
#> Error in data.table::fread("a,b,c\n0,1,2", na.strings = "0", verbose = TRUE):
#>   freadMain: NAstring <<0>> is recognized as type boolean, this is not permitted.

@msgoussi
Copy link
Author

msgoussi commented Jun 11, 2018

Sys.info()
sysname release version nodename
"Windows" ">= 8 x64" "build 9200" "IRT-310677-Z440"
machine login user effective_user
"x86-64" "310677" "310677" "310677"

data.table package: 1.11.4
The file is so huge, it is almost 1.44 GB

@MichaelChirico
Copy link
Member

Thank @HughParsonage, darn that looks no bueno.

I guess there should be some interaction with logical01 but there doesn't appear to be.

@jangorecki jangorecki added this to the 1.11.6 milestone Jun 11, 2018
@msgoussi
Copy link
Author

Is it a bug?

@jangorecki
Copy link
Member

@msgoussi yes, you can workaround it with something like

data.table::fread("a,b,c\n0,1,2")[a==0, a:=NA]

@msgoussi
Copy link
Author

msgoussi commented Jun 11, 2018

If i have 500 columns and i need to clean columns and consider the following strings , c("", "-", "_", "..", "...", "--", "**", "" ,
"n/a", "n.a.", "#VALUE!", "0", "Inf", "-Inf", "NAN", "r", "e"), as na. The way around will not look good. This is my opnion

@MichaelChirico
Copy link
Member

MichaelChirico commented Jun 11, 2018 via email

@mattdowle mattdowle modified the milestones: 1.11.6, 1.12.0 Sep 20, 2018
@mattdowle mattdowle modified the milestones: 1.12.0, 1.12.2 Jan 11, 2019
@mattdowle mattdowle changed the title na.strings in fread na.strings="0" is not permitted in fread Jan 11, 2019
@mattdowle mattdowle removed this from the 1.12.2 milestone Jan 14, 2019
@ribailey
Copy link

ribailey commented Jul 3, 2020

I have an R package on Github, ribailey/gghybrid, which includes a function to read genomic data files with potentially millions of columns, and one of the main softwares for producing these input files, PLINK, always codes missing data as zero. I use fread within my function to read in the data and declare missing values. This means I can't read in the most common file type people might want to use, due to the bug described here. Has there been any progress on this? Many thanks, Richard.

@ben-schwen
Copy link
Member

Removing || strcmp(ch,"1")==0 || strcmp(ch,"0")==0 from fread.c seems like an option to "fix" this.

Only breaks 1 test case which is explicitly testing for na.strings = '1'.

Interaction with logical01 seems also legit.

data.table::fread("a,b,c\n0,1,2\n1,0,2", na.strings = "0", logical01=T)

        a      b     c                                                                                                     
    <lgcl> <lgcl> <int>                                                                                                  
1:     NA   TRUE     2                                                                                                  
2:   TRUE     NA     2 

@shrektan
Copy link
Member

I think the expected behavior is that we should not allow logical01=TRUE and na.strings = "0" at the same time. If this is agreed, I'm happy to start to file a PR for this.

@MichaelChirico
Copy link
Member

Agreed @shrektan. Nor na.strings = "1" though I guess that's a pretty obscure use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants