Support for on-the-fly columns in on #1639

MichaelChirico · 2016-04-10T20:02:11Z

The ability to create columns in on on-the-fly, like we can in j or by, would be appreciated.

As a usage example, this would have simplified my answer to the following self-join / moving-average-type answer:

http://stackoverflow.com/a/36534824/3576984

Reproduced here:

require(data.table)
date <- as.Date(c('2012-01-01','2012-01-01','2012-01-01',
                  '2012-01-02','2012-01-02','2012-01-03',
                  '2012-01-04','2012-01-05','2012-01-05',
                  '2012-01-06','2012-01-06','2012-01-06'))
email <- c('[email protected]', '[email protected]','[email protected]',
           '[email protected]', '[email protected]','[email protected]',
           '[email protected]','[email protected]','[email protected]',
           '[email protected]','[email protected]','[email protected]')
dt <- data.table(date, email)

dt[dt[ , .(date3=date, date2 = date - 2, email)], 
   on = .(date >= date2, date<=date3), 
   allow.cartesian = TRUE
   ][ , .(count = uniqueN(email)), 
      by = .(date = date + 2)]

Would have been prettier / easier to parse as:

dt[dt, on = .(date >= date-2, date<=date), allow.cartesian = TRUE
   ][ , .(count = uniqueN(email)), by = date]

The text was updated successfully, but these errors were encountered:

jangorecki · 2016-04-10T22:07:22Z

looks like a duplicate of #625

aryoda · 2016-05-04T08:07:32Z

Big vote for this enhancement since it makes the syntax of data.table shorter, more intuitive and more SQL alike.

arunsrinivasan · 2016-06-25T14:59:09Z

When done, update:

http://stackoverflow.com/q/37978658/559784
and
https://stackoverflow.com/q/48282827/559784

Henrik-P · 2018-03-02T11:03:41Z

Because Arun started a list with SO posts which may be updated, I thought it I could add some more I stumbled over:

How to merge dataframes of unequal length based on time with buffer intervals in R?

Data.Table non-equi join with arithmetic operations

subset data.frame base on a time interval + or - list of dates

franknarf1 · 2018-10-20T06:41:50Z

To update: https://stackoverflow.com/q/52901099

jaapwalhout · 2018-11-30T15:10:21Z

To update: https://stackoverflow.com/q/53559119/2204410
(or close as a duplicate with one of the links above)

UweBlock · 2019-06-18T07:30:22Z

To update: inner_join() with range of values for one of the keys (year)

ColeMiller1 · 2020-02-22T20:05:20Z

What would the API be? The difficult part is that users may want to use the newly created on condition. The evaluation would be easier if we could get users to use a helper function if they want to access the new column in j.

dt[dt, 
   on = .(date >= list(past_date = date-2), 
          date <= date), 
   allow.cartesian = TRUE]

The alternative would be like SQL and have the eval happen twice, once in on and then have users repeat in j

dt[dt,
   on = .(date >= date-2,
          date<=date), 
   .(x.date, past_date = i.date - 2),
   allow.cartesian = TRUE]

UweBlock · 2020-03-08T10:52:58Z

To update https://stackoverflow.com/a/60586692/3817004

jangorecki · 2020-06-27T20:47:12Z

AFAIR bmerge already do shallow copies of data, so adding new columns there would be probably easiest way to achieve that. We just need a proper handling for is.call of LHS and RHS of each element of on argument using internal .parse_on, and then using that in bmerge. The most tricky part can be handling column names well. Eventually we could allow computed columns to be named, the same as we allow in by argument: .(date2 = date >= date-2, date<=date).

MichaelChirico mentioned this issue Apr 10, 2016

binary search extensions to <, <=, >, >= #1452

Closed

5 tasks

arunsrinivasan added the enhancement label Apr 11, 2016

arunsrinivasan added this to the v2.0.0 milestone Jul 5, 2016

arunsrinivasan added the High label Aug 24, 2016

arunsrinivasan self-assigned this Aug 24, 2016

mattdowle removed this from the Candidate milestone May 10, 2018

franknarf1 mentioned this issue Oct 3, 2018

'on' clause fails on white spaces around operators and on operators in variable names #3092

Closed

MichaelChirico mentioned this issue Dec 6, 2018

Master list of most-requested issues #3189

Open

76 tasks

MichaelChirico mentioned this issue Feb 7, 2019

[Feature request] Allow on to work with intermediate/calculated columns #3367

Closed

MichaelChirico mentioned this issue Aug 19, 2019

[Request] join on=.(sign(v) = v_sign) #2203

Closed

MichaelChirico mentioned this issue Jan 16, 2020

Merge while Ignoring Case #4181

Closed

MichaelChirico mentioned this issue Mar 6, 2020

Expressions in non-equi join #4283

Closed

jangorecki mentioned this issue Apr 5, 2020

[R-Forge #2186] joins on arbitrary functions #625

Closed

jangorecki added the joins label Apr 6, 2020

MichaelChirico added top request One of our most-requested issues and removed High labels Jun 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for on-the-fly columns in on #1639

Support for on-the-fly columns in on #1639

MichaelChirico commented Apr 10, 2016

jangorecki commented Apr 10, 2016

aryoda commented May 4, 2016

arunsrinivasan commented Jun 25, 2016 •

edited

Loading

Henrik-P commented Mar 2, 2018

franknarf1 commented Oct 20, 2018

jaapwalhout commented Nov 30, 2018

UweBlock commented Jun 18, 2019

ColeMiller1 commented Feb 22, 2020

UweBlock commented Mar 8, 2020

jangorecki commented Jun 27, 2020

Support for on-the-fly columns in on #1639

Support for on-the-fly columns in on #1639

Comments

MichaelChirico commented Apr 10, 2016

jangorecki commented Apr 10, 2016

aryoda commented May 4, 2016

arunsrinivasan commented Jun 25, 2016 • edited Loading

Henrik-P commented Mar 2, 2018

franknarf1 commented Oct 20, 2018

jaapwalhout commented Nov 30, 2018

UweBlock commented Jun 18, 2019

ColeMiller1 commented Feb 22, 2020

UweBlock commented Mar 8, 2020

jangorecki commented Jun 27, 2020

arunsrinivasan commented Jun 25, 2016 •

edited

Loading