Skip to content

Commit

Permalink
Closes #1148. CJ gains 'unique' argument. Default FALSE atm.
Browse files Browse the repository at this point in the history
  • Loading branch information
arunsrinivasan committed May 28, 2015
1 parent 3d021f9 commit 7f271b6
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 11 deletions.
3 changes: 2 additions & 1 deletion R/setkey.R
Original file line number Diff line number Diff line change
Expand Up @@ -292,12 +292,13 @@ SJ <- function(...) {

# TO DO?: Use the CJ list() replication method for SJ (inside as.data.table.list?, #2109) too to avoid alloc.col

CJ <- function(..., sorted = TRUE)
CJ <- function(..., sorted = TRUE, unique = FALSE)
{
# Pass in a list of unique values, e.g. ids and dates
# Cross Join will then produce a join table with the combination of all values (cross product).
# The last vector is varied the quickest in the table, so dates should be last for roll for example
l = list(...)
if (unique) l = lapply(l, unique)

# using rep.int instead of rep speeds things up considerably (but attributes are dropped).
j = lapply(l, class) # changed "vapply" to avoid errors with "ordered" "factor" input
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@

22. `merge.data.table` now has new arguments `by.x` and `by.y`. Closes [#637](https://github.com/Rdatatable/data.table/issues/637). Thanks to @NelloBlaser.

23. `CJ` gains logical `unique` argument with default `FALSE`. If `TRUE`, unique values of vectors are automatically computed and used. This is convenient, for example, `DT[CJ(a, b, c, unique=TRUE)]` instead of doing `DT[CJ(unique(a), unique(b), unique(c))]`. Ultimately, `unique = TRUE` will be default. Closes [#1148](https://github.com/Rdatatable/data.table/issues/1148).

#### BUG FIXES

1. `if (TRUE) DT[,LHS:=RHS]` no longer prints, [#869](https://github.com/Rdatatable/data.table/issues/869). Tests added. To get this to work we've had to live with one downside: if a `:=` is used inside a function with no `DT[]` before the end of the function, then the next time `DT` is typed at the prompt, nothing will be printed. A repeated `DT` will print. To avoid this: include a `DT[]` after the last `:=` in your function. If that is not possible (e.g., it's not a function you can change) then `print(DT)` and `DT[]` at the prompt are guaranteed to print. As before, adding an extra `[]` on the end of `:=` query is a recommended idiom to update and then print; e.g. `> DT[,foo:=3L][]`. Thanks to Jureiss for reporting.
Expand Down
5 changes: 5 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -6410,6 +6410,11 @@ ans1 = merge(d1, d2, by.x = "x1", by.y = "x2")
ans2 = setkey(setDT(merge.data.frame(d1, d2, by.x = key(d1), by.y = key(d2))), x1)
test(1524, ans1, ans2)

# 'unique =' argument for CJ, #1148
x = c(1, 2, 1)
y = c(5, 8, 8, 4)
test(1525, CJ(x, y, unique=TRUE), CJ(c(1,2), c(4,5,8)))

##########################


Expand Down
34 changes: 24 additions & 10 deletions man/J.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,35 @@
\alias{SJ}
\title{ Creates a Join data table }
\description{
Creates a data.table to be passed in as the i to a [.data.table join.
Creates a \code{data.table} to be passed in as the \code{i} to a \code{[.data.table} join.
}

\usage{
# DT[J(...)] # J() only for use inside DT[...].
SJ(...) # DT[SJ(...)]
CJ(..., sorted = TRUE) # DT[CJ(...)]
# DT[J(...)] # J() only for use inside DT[...].
SJ(...) # DT[SJ(...)]
CJ(..., sorted = TRUE, unique = FALSE) # DT[CJ(...)]
}

\arguments{
\item{\dots}{ Each argument is a vector. Generally each vector is the same length but if they are not then usual silent repitition is applied. }
\item{sorted}{ logical. Should the input order be retained?}
\item{unique}{ logical. When \code{TRUE}, only unique values of each vectors are used (automatically). }
}
\details{
\code{SJ} and \code{CJ} are convenience functions for creating a data.table in the context of a data.table 'query' on \code{x}.
\code{x[data.table(id)]} is the same as \code{x[J(id)]} but the latter is more readable. Identical alternatives are \code{x[list(id)]} and \code{x[.(id)]}.
\code{x} must have a key when passing in a join table as the \code{i}. See \code{\link{[.data.table}}
\code{SJ} and \code{CJ} are convenience functions for creating a data.table in the context of a data.table 'query' on \code{x}.

\code{x[data.table(id)]} is the same as \code{x[J(id)]} but the latter is more readable. Identical alternatives are \code{x[list(id)]} and \code{x[.(id)]}.

\code{x} must have a key when passing in a join table as the \code{i}. See \code{\link{[.data.table}}
}
\value{
J : the same result as calling list. J is a direct alias for list but results in clearer more readable code.
SJ : (S)orted (J)oin. The same value as J() but additionally setkey() is called on all the columns in the order they were passed in to SJ. For efficiency, to invoke a binary merge rather than a repeated binary full search for each row of \code{i}.
CJ : (C)ross (J)oin. A data.table is formed from the cross product of the vectors. For example, 10 ids, and 100 dates, CJ returns a 1000 row table containing all the dates for all the ids. It gains \code{sorted}, which by default is TRUE for backwards compatibility. FALSE retains input order.
\itemize{
\code{J} : the same result as calling list. J is a direct alias for list but results in clearer more readable code.

\code{SJ} : (S)orted (J)oin. The same value as J() but additionally setkey() is called on all the columns in the order they were passed in to SJ. For efficiency, to invoke a binary merge rather than a repeated binary full search for each row of \code{i}.

\code{CJ} : (C)ross (J)oin. A data.table is formed from the cross product of the vectors. For example, 10 ids, and 100 dates, CJ returns a 1000 row table containing all the dates for all the ids. It gains \code{sorted}, which by default is TRUE for backwards compatibility. FALSE retains input order.
}
}
\seealso{ \code{\link{data.table}}, \code{\link{test.data.table}} }
\examples{
Expand All @@ -37,6 +46,11 @@ DT[list("b")] # same
CJ(c(5,NA,1), c(1,3,2)) # sorted and keyed data.table
do.call(CJ, list(c(5,NA,1), c(1,3,2))) # same as above
CJ(c(5,NA,1), c(1,3,2), sorted=FALSE) # same order as input, unkeyed
# use for 'unique=' argument
x = c(1,1,2)
y = c(4,6,4)
CJ(x, y, unique=TRUE) # unique(x) and unique(y) are computed automatically

}
\keyword{ data }

Expand Down

0 comments on commit 7f271b6

Please sign in to comment.