Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent logical indexing behaviour #758

Closed
mchen402 opened this issue Aug 6, 2014 · 3 comments
Closed

Inconsistent logical indexing behaviour #758

mchen402 opened this issue Aug 6, 2014 · 3 comments
Assignees
Labels
Milestone

Comments

@mchen402
Copy link

mchen402 commented Aug 6, 2014

When the i index contains a mix of T and F, surprisingly there are no less rows then the original data.table:

data.table(x = 1:2)[c(F, T), list(x, y = 3:4)]   # 2 rows returned
##    x y
## 1: 2 3
## 2: 2 4

I would have expected

##    x y
## 1: 2 4

This is at odds with data.frame intuition:

data.frame(x = 1:2)[c(F, T), c("x", "x")]  # 1 row returned
##   x x.1
## 2 2   2

Edge-cases

Moreover, the case where i = F does not even yield a valid result, which makes this an annoying edge-case to deal with:

data.table(x = 1:2)[c(F, F), list(x, y = 3:4)]
## Error in if (mn%%n[i] != 0) warning("Item ", i, " is of size ", n[i],  : 
##   missing value where TRUE/FALSE needed

I would have expected

## Empty data.table (0 rows) of 2 cols: x,y

The other edge case where all i = T does work as expected:

data.table(x = 1:2)[c(T, T), list(x, y = 3:4)]
##    x y
## 1: 1 3
## 2: 2 4

Is there any explanation behind this behaviour?

@arunsrinivasan
Copy link
Member

Your data.frame equivalent is adding columns, not rows, or rather a new column with identical number of rows. They're not equivalent operations. You can use transform to get the approx. equivalent operation, which is more or less identical in behaviour:

transform(data.frame(x=1:2)[c(F,T), , drop=FALSE], y=3:4)
#   x y
# 1 2 3
# 2 2 4
# Warning message:
# In data.frame(list(x = 2L), y = 3:4) :
#   row names were found from a short variable and have been discarded

transform(data.frame(x=1:2)[c(F,F), ], y=3:4)
# Error in data.frame(list(X_data = integer(0)), y = 3:4) : 
#   arguments imply differing number of rows: 0, 2

@arunsrinivasan
Copy link
Member

@tunaaa,

There are two things here:

  1. The order of operations DT[i, j]. It first evaluates i, and then j. Not the other way around. So, in the first case, after the row subset using c(FALSE,TRUE), it's left with:
#    x
# 1: 2

And then, use use list(x, y=3:4), where, the shorter column is automatically recycled to fit the longest column's length.

  1. For the same reason, in the second case, after the subset, x is of length 0 = integer(0), and therefore could be recycled to fit the length of 2, with the value NA. But this resulted in an error because of an invalid condition check. I'll fix this (after checking in with Matt).

data.table always tries to recycle columns automatically to fit the longest column, and warns if the recycling leaves a reminder.

@arunsrinivasan arunsrinivasan self-assigned this Sep 25, 2014
@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Sep 25, 2014
@arunsrinivasan
Copy link
Member

To fix - only the case:

data.table(x = 1:2)[c(F, F), list(x, y = 3:4)]

which should result in

## Empty data.table (0 rows) of 2 cols: x,y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants