Should be able to refer to i's .SD during a join. #935

arunsrinivasan · 2014-11-07T19:07:03Z

set.seed(45L)
dt1= data.table(x=rep(1:2, each=10), y=sample(8,20,TRUE), key="x")
dt2 = data.table(x=1:2, y1=1:2, y2=3:4, y3=5:6, y4=7:8, key="x")

Now if I'd like to join and for each match get the count of values y >= y_i. Using by=.EACHI this is quite simple:

dt1[dt2, .(sum(y>=y1), sum(y>=y2), sum(y>=y3), sum(y>=y4)), by=.EACHI]

Having a i.SD would allow usage of:

dt1[dt2, lapply(i.SD, function(x) sum(y > x)), by=.EACHI]

The text was updated successfully, but these errors were encountered:

matthieugomez · 2014-11-09T19:07:29Z

That's be nice. i.SD would also be useful if one wants to merge in place dt1[dt2, :=] and keeps dt2 columns

MichaelChirico · 2015-09-03T14:42:03Z

Adding my hat to the ring for this, as it would help solve a problem I encountered recently, basically what @matthieugomez mentioned--

Namely, updating many columns by reference during a join operation, especially when the inner table may have hard-to-predict column names.

That is, DT1[DT2, (names(DT2) := mget(paste0("i.", names(DT2)))] works (though looks clunky), but this approach doesn't work if we're doing something like:

DT1[DT2[,(summarize some variables),by=list(other_variables)],
        (what_to_use?):=(hard/inconvenient to predict/control names),on=other_variables]

At least with i.SD, we'd only have to keep track of the number of variables to be joined.

Here's an example:

trts <- LETTERS[1:20]

DT1 <- data.table(trts, location = 1:20)

set.seed(1023)
nn <- 10000L
DT2 <- data.table(id = sample(1:25, nn, T),
                  trts = sample(trts, nn, T),
                  val1 = runif(nn),
                  val2 = (runif(nn) > .2))

DT1[dcast(DT2[, .(sum(val1), mean(val2)), by = .(id, trts)],
          trts ~ id, value.var = c("V1", "V2")), on = "trts"]

It's easy to merge, but not to update DT1 by reference.

aryoda · 2016-05-04T09:00:58Z

Any news on that FR?

Since the i.col and x.col syntax is now supported in joins since 1.9.7 this may be a good code base for implementing x.SD and i.SD variables?

Allow x's cols to be referred to using 'x.' prefix

I'd love to see this feature in the data.table!

UweBlock · 2017-10-29T17:46:39Z

There is a related question on SO: How can I access all columns of i when using .EACHI in data.table

MichaelChirico · 2019-04-11T10:57:10Z

A bit strange that get works on 'i.col' strings but eval does not (as pointed out in #1180).

This answer is very clunky; revisit when this is closed

jangorecki · 2020-04-08T22:40:30Z

I spotted it is already possible to refer to i's .SD, but using .iSD name rather than i.SD, and only when using by=.EACHI. But it doesn't really give any meaningful answer.

d1 = data.table(id=1:3)
d2 = data.table(id=2:4, v1=5:7, v2=6:8)

d1[d2, .iSD, on="id", by=.EACHI]
#Empty data.table (0 rows and 3 cols): id,v1,v2

d1[d2, .SD, on="id", by=.EACHI]
#Empty data.table (0 rows and 3 cols): id,v1,v2

ColeMiller1 · 2020-07-14T02:01:51Z

Adding to Jan's observation, we can trick jisvars and jiscols by just including quote(mget) in a {...} expression:

library(data.table)
set.seed(45L)
dt1= data.table(x=rep(1:2, each=10), y=sample(8,20,TRUE), key="x")
dt2 = data.table(x=1:2, y1=1:2, y2=3:4, y3=5:6, y4=7:8, key="x")

dt1[dt2, .(sum(y>=y1), sum(y>=y2), sum(y>=y3), sum(y>=y4)), by=.EACHI]
#>    x V1 V2 V3 V4
#> 1: 1 10  9  5  3
#> 2: 2  9  9  4  1
dt1[dt2, {quote(mget)
  lapply(.iSD, function(x) sum(y >= x))}, by=.EACHI]
#>    x  x y1 y2 y3 y4
#> 1: 1 10 10  9  5  3
#> 2: 2  9  9  9  4  1

Here, I also have the x column allocated but it's pretty close. So, it seems like this would be easy to implement if we used some of what is done with .SD.

UweBlock · 2020-07-14T11:01:58Z

For the sake of completeness, we can get the same result as in ColeMiler1's comment with

dt1[dt2, lapply(mget(names(dt2)), function(x) sum(y >= x)), by = .EACHI]

or

dt1[dt2, lapply(mget(names(.iSD)), function(x) sum(y >= x)), by = .EACHI]

   x  x y1 y2 y3 y4
1: 1 10 10  9  5  3
2: 2  9  9  9  4  1

BTW, we can get rid of the redundant second x column by

dt1[dt2, lapply(mget(setdiff(names(.iSD), names(.BY))), function(x) sum(y >= x)), by=.EACHI]
   x y1 y2 y3 y4
1: 1 10  9  5  3
2: 2  9  9  4  1

data.table version 1.12.9 used.

ColeMiller1 · 2020-07-14T11:23:46Z

@UweBlock that seems more logical 😀

I assume .iSD would by default remove the joining columns. I think the other question is whether .iSD is always allocated or whether it depends on NSE detecting the use of get, mget, eval, or .iSD. To be consistent with .SD memory allocation approach, I assume we would use NSE.

arunsrinivasan added the feature request label Nov 7, 2014

arunsrinivasan mentioned this issue Jun 17, 2015

eval doesn't work on join columns #1180

Open

arunsrinivasan added this to the v2.0.0 milestone Jun 12, 2016

mattdowle removed this from the Candidate milestone May 10, 2018

MichaelChirico mentioned this issue Dec 6, 2018

Need an easier way to in-place merge multiple columns #3184

Closed

MichaelChirico mentioned this issue Oct 15, 2019

Master list of most-requested issues #3189

Open

76 tasks

jangorecki mentioned this issue Nov 29, 2019

Convenience method for join+insertion operation #4080

Closed

MichaelChirico added the High label May 30, 2020

jangorecki added joins and removed High labels Jun 3, 2020

MichaelChirico added the top request One of our most-requested issues label Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should be able to refer to i's .SD during a join. #935

Should be able to refer to i's .SD during a join. #935

arunsrinivasan commented Nov 7, 2014

matthieugomez commented Nov 9, 2014

MichaelChirico commented Sep 3, 2015

aryoda commented May 4, 2016 •

edited

Loading

UweBlock commented Oct 29, 2017

MichaelChirico commented Apr 11, 2019

jangorecki commented Apr 8, 2020 •

edited

Loading

ColeMiller1 commented Jul 14, 2020

UweBlock commented Jul 14, 2020 •

edited

Loading

ColeMiller1 commented Jul 14, 2020

Should be able to refer to i's .SD during a join. #935

Should be able to refer to i's .SD during a join. #935

Comments

arunsrinivasan commented Nov 7, 2014

matthieugomez commented Nov 9, 2014

MichaelChirico commented Sep 3, 2015

aryoda commented May 4, 2016 • edited Loading

UweBlock commented Oct 29, 2017

MichaelChirico commented Apr 11, 2019

jangorecki commented Apr 8, 2020 • edited Loading

ColeMiller1 commented Jul 14, 2020

UweBlock commented Jul 14, 2020 • edited Loading

ColeMiller1 commented Jul 14, 2020

aryoda commented May 4, 2016 •

edited

Loading

jangorecki commented Apr 8, 2020 •

edited

Loading

UweBlock commented Jul 14, 2020 •

edited

Loading