Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow coercion for NA input to fifelse #4277

Closed
MichaelChirico opened this issue Mar 4, 2020 · 12 comments · Fixed by #4289
Closed

Allow coercion for NA input to fifelse #4277

MichaelChirico opened this issue Mar 4, 2020 · 12 comments · Fixed by #4289
Assignees
Milestone

Comments

@MichaelChirico
Copy link
Member

fifelse(c(TRUE, FALSE), NA, 1L)
# Error in fifelse(c(TRUE, FALSE), NA, 1L) : 
#   'yes' is of type logical but 'no' is of type integer. Please make sure that both arguments have the same type.

that typeof(NA) is logical I think is an implementaton detail we can obscure from users. we should coerce NA to NA_integer_ in this case.

@shrektan
Copy link
Member

shrektan commented Mar 4, 2020

I'm happy to file a PR if you haven't started yet...

@MichaelChirico
Copy link
Member Author

I haven't! please do

@2005m
Copy link
Contributor

2005m commented Mar 4, 2020

Just my view, but I don't think it is a good idea to i/obscure things like this to users ii/break consistency however small it is iii/introduce exception. User must be fully aware of what is going on when they use a function.

@shrektan
Copy link
Member

shrektan commented Mar 4, 2020

The thing is, in my opinion, we need to provide an interface that's easy for ordinary R users.

Unfortunately, many R users just don't know much about programming, don't use GitHub, don't know there's a difference between NA and NA_real_/integer_. Yes, they should be aware of this differences. However, the base R does this coercion everywhere, if 'data.table' do not, they may just think it's a bug of data.table and stop using it - and this is the least I want.

In conclusion, I think we'd better support it as well, especially considering there's no penalty for having this (the performance impact I believe is minimal and the maintenance should be fine).

@2005m
Copy link
Contributor

2005m commented Mar 4, 2020

So you would rather let them be ignorant instead of educating them via useful, clear and helpful error messages?
Just to give you a counter example, dplyr::if_else does not do it. Just quoting it because a lot of people (around here included) consider that package as a reference. (Just to be clear it is not my case)

@jangorecki
Copy link
Member

jangorecki commented Mar 4, 2020

NA is quite different thing, I would rather prefer to warn on using 1 where 1L is meant to be used than NAs. For example in Julia lang, missing value does not have a type, this kind behavior is partially used in base, where NA type is automatically adjusted. I think it is good, to be fair, NA does not need to have a type from the logical perspective, it is only an R implementation that made it relevant.

@2005m
Copy link
Contributor

2005m commented Mar 4, 2020

I agree on the 1 and 1L but not on NA as it is clearly defined in the R documentation as "a logical constant of length 1" and you clearly do not have a NA_logical_ . Saying that NA type is sometimes automatically adjusted in base R is not a valid argument because a lot of things in base R completely lack of consistency. ifelse is a shocking example which also explains why there are various implementations...

@shrektan
Copy link
Member

shrektan commented Mar 4, 2020

Yes, I agree that base R is lack of consistency so I have to admit my argument doesn't stand. 😢

However, despite the documentation that NA is "a logical constant of length 1", I'm afraid NA is indeed treated as a sort of generic missing value in many cases... For example, fcase() has default=NA where NA is treated as a generic... I believe this argument stands considering the fact that NA is notNA_logical_. It has no suffix like NA_real_...

@jangorecki
Copy link
Member

In data.table logical NA is coerced to required type in majority of our features/functions. So having a policy about educating users on NA type sensitive usage would not be consistent. If there is no performance cost of it, then for me it is clearly better to make it easier for users. The way it could be, as for example in julia.

@MichaelChirico
Copy link
Member Author

My 2 cents...

We're package devs for a high-level language. The point of a high-level language is for end users to be able to write simpler code & not need to know every last detail of the programs they're writing (if they do, more power to them, of course!). The burden of keeping track of these finer points is firmly in the hands of the package devs who can do a bunch of stuff "under the rug" to keep the user-facing house in order.

Not really a scientific poll but a point of reference nevertheless:

https://twitter.com/michael_chirico/status/1235175842034020352

Less than half of users following #rstats who responded knew off the bat that NA is logical. And I think that's fine! Yes, we could catch this and point this out to users, but honestly I don't see how this makes them any better off for all but a tiny set of "power users" -- that I know NA is logical in R by heart will not help me stop an epidemic, or design a new machine translation tool, or empower a charity.

The code we could write to say "hey, don't you meal NA_real_?" could just as easily be used to just silently convert NA to NA_real_.

Somewhat oblique case in point: Instagram auto-lints PRs for many common syntax inconsistencies:

https://instagram-engineering.com/static-analysis-at-scale-an-instagram-story-8f498ab71a0c

They've decided that rather than blocking new code because of style inconsistencies or other such minor things, they'd rather automatically fix them with a bot and allow engineers to focus their head space on less tedious things.

@ColeMiller1
Copy link
Contributor

Looks like dplyr is going this route, too:

tidyverse/dplyr#5106 (comment)

TBH, when I was a R newbie I was always surprised by this type of error. SQL just has NULL - why wouldn't NA coerce to whatever it needs to be.

@matthewgson
Copy link

matthewgson commented May 16, 2020

I think including some tips on the error would be nice for end-users like me. This would be enough to prevent confusion at the least.

For example,

Error in fifelse(condition, NA, 'Nah') : 
  'yes' is of type logical but 'no' is of type of character. Please make sure that both arguments have the same type. 
Do you mean fifelse(condition, NA_character_, 'Nah')?

@mattdowle mattdowle added this to the 1.14.1 milestone May 14, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants