-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
na.strings="0" is not permitted in fread #2927
Comments
Please include your |
data.table::fread("a,b,c\n0,1,2", na.strings = "0", verbose = TRUE)
#> Input contains a \n or is "". Taking this to be text input (not a filename)
#> [01] Check arguments
#> Using 12 threads (omp_get_max_threads()=12, nth=12)
#> Error in data.table::fread("a,b,c\n0,1,2", na.strings = "0", verbose = TRUE):
#> freadMain: NAstring <<0>> is recognized as type boolean, this is not permitted. |
Sys.info() data.table package: 1.11.4 |
Thank @HughParsonage, darn that looks no bueno. I guess there should be some interaction with |
Is it a bug? |
@msgoussi yes, you can workaround it with something like data.table::fread("a,b,c\n0,1,2")[a==0, a:=NA] |
If i have 500 columns and i need to clean columns and consider the following strings , c("", "-", "_", "..", "...", "--", "**", "" , |
you're correct, but until the bug is fixed, you can at least get rid of all
the other NA strings with fread, then only replace 0 in the code.
…On Tue, Jun 12, 2018, 5:36 AM msgoussi ***@***.***> wrote:
If i have 500 columns and i need to clean colmuns and consider the
following strings , c("", "-", "_", "..", "...", "--", "**", "" ,
"n/a", "n.a.", "#VALUE!", "0", "Inf", "-Inf", "NAN", "r", "e"), as na. The
way around will not look good. This is my opnion
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2927 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHQQdRcsWHnfXUeJFvIlQ3a7WLUeMyyRks5t7uLBgaJpZM4UiP9z>
.
|
I have an R package on Github, ribailey/gghybrid, which includes a function to read genomic data files with potentially millions of columns, and one of the main softwares for producing these input files, PLINK, always codes missing data as zero. I use fread within my function to read in the data and declare missing values. This means I can't read in the most common file type people might want to use, due to the bug described here. Has there been any progress on this? Many thanks, Richard. |
Removing Only breaks 1 test case which is explicitly testing for na.strings = '1'. Interaction with
|
I think the expected behavior is that we should not allow |
Agreed @shrektan. Nor |
When I use fread, I get this error (NAstring <<0>> is recognized as type boolean, this is not permitted)
na.strings = c("", "-", "_", "..", "...", "--", "**", "" ,
"n/a", "n.a.", "#VALUE!", "0", "Inf", "-Inf", "NAN", "r", "e")
If I removed "0" from na.strings, fread is not getting error.
However, cells that contains "r" or "e", is not converted to NA, and their columns are characters.
Please advise.
The text was updated successfully, but these errors were encountered: