Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread reads in empty fields as logical NA #1159

Closed
eantonya opened this issue May 22, 2015 · 6 comments
Closed

fread reads in empty fields as logical NA #1159

eantonya opened this issue May 22, 2015 · 6 comments

Comments

@eantonya
Copy link
Contributor

The following results in very unpleasant behavior if you read and subsequently write a csv:

fread('a,b\n1,')
#   a  b
#1: 1 NA

Second column is read in as a logical column, which means empty values get converted to NA. Those columns should instead be read in as character.

@arunsrinivasan
Copy link
Member

Why should it be read as character by default? Why not specify colClasses = .. instead.

@eantonya
Copy link
Contributor Author

Because an empty column does not fit in a logical type without loss of information. After reading it in, you can't know anymore if you had an all NA column in your csv or all empty.

colClasses is not a good solution imo, as it requires you knowing too much info before even reading the file, which I don't think should be necessary.

@arunsrinivasan
Copy link
Member

I see. Then presence of any empty element should result in a character column?

fread("a,b\n1,\n2,NA") 
fread("a,b\n1,\n2,5")

In the last case, I guess you'd agree it makes sense to have it as integer column, even though (strictly speaking) that is also loss of information...?

@eantonya
Copy link
Contributor Author

I didn't consider that, but I think you're right and presence of any empty element should result in a character column by default (at least as long as NA's are written as "NA" by default, which they are for write.csv).

It feels like na.strings should be used somehow to make fread read in that empty element as either character or smth else in your examples, but I can't seem to be able to make that work (I've never used na.strings before tbh).

@arunsrinivasan
Copy link
Member

Thanks, I get it now. na.strings = "" would convert all "" to NA. So the current functionality seems to implicitly assume na.strings = "".. (whereas the default value is "NA"). That might a way to look at this issue.

@mattdowle
Copy link
Member

mattdowle commented Mar 3, 2018

I'm hoping PR #2652 resolves this one.
( Aside: An all empty column is read as type logical because that's the lowest type. The thinking behind that is type-bumping which always bumps upwards. Now in dev, an automatic reread happens to ensure absolutely no loss where for example '000' was read as '0L' after the bump should have been read as character. That used to be warning and now is automatic re-read.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants