Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #3639 -- proper handling of CPLXSXP in dogroups #3689

Merged
merged 15 commits into from
Jul 16, 2019
Merged
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@

24. `column not found` could incorrectly occur in rare non-equi-join cases, [#3635](https://github.com/Rdatatable/data.table/issues/3635). Thanks to @UweBlock for the report.

25. Complex columns used in `j` during grouping would get mangled, [#3639](https://github.com/Rdatatable/data.table/issues/3639). A related bug prevented assigning complex values using `:=` except for full-column plonks. We still do not support grouping `by` a complex column; please file a feature request if you would use this in your own work. Thanks to @eliocamp for filing the bug report.

#### NOTES

1. `rbindlist`'s `use.names="check"` now emits its message for automatic column names (`"V[0-9]+"`) too, [#3484](https://github.com/Rdatatable/data.table/pull/3484). See news item 5 of v1.12.2 below.
Expand Down
1 change: 0 additions & 1 deletion R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -889,7 +889,6 @@ replace_order = function(isub, verbose, env) {
} # else maybe a call to transform or something which returns a list.
av = all.vars(jsub,TRUE) # TRUE fixes bug #1294 which didn't see b in j=fns[[b]](c)
use.I = ".I" %chin% av
# browser()
if (any(c(".SD","eval","get","mget") %chin% av)) {
if (missing(.SDcols)) {
# here we need to use 'dupdiff' instead of 'setdiff'. Ex: setdiff(c("x", "x"), NULL) will give 'x'.
Expand Down
22 changes: 22 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -15200,6 +15200,28 @@ d2 = data.table(r = 1:5, s = seq(0L, 20L, 5L))
test(2062.1, d1[d2, on = .(a <= s, b >= s), j = .SD], ans<-data.table(a=INT(0,5,10,10,15,20), b=INT(0,5,10,10,15,20)))
test(2062.2, d1[d2, on = .(a <= s, b >= s)][, .(a, b)], ans)

# #3639 -- complex values in grouping
set.seed(42)
DT = CJ(x = 1:10, a = c("a", "b"), b = 1:2)
DT[ , z := complex(rnorm(1:.N), rnorm(1:.N))]
## can simplify this test after #1444
test(2063.1, all.equal(setkey(copy(DT), NULL), DT[, .(x = x, z = z), by = .(a, b)][order(x, a, b)], ignore.col.order = TRUE))
test(2063.2, DT[ , base::sum(z), by = a], data.table(a = c('a', 'b'), V1 = c(5.0582228485073+0i, -1.8644229822705+0i)))
test(2063.3, DT[ , sum(Mod(z)), by = b], data.table(b = 1:2, V1 = c(16.031422657932, 13.533483145656)))
## mimicking test 171.3 for coverage
x = data.table(A=c(25L,85L,25L,25L,85L), B=c("a","a","b","c","c"), z=0:4 + (4:0)*1i)
test(2063.4, x[ , data.table(A, z)[A==25, z] + data.table(A, z)[A==85, z], by=B],
data.table(B = c('a', 'c'), V1 = c(1, 7) + (c(7, 1))*1i))
## mimicking test 771 for coverage
a = data.table(first=1:6, third=c(1i,1,1i,3,3i,4), key="first")
b = data.table(first=c(3,4,4,5,6,7,8), second=1:7, key="first")
test(2063.5, b[ , third:=a[b, third, by=.EACHI]], error="Supplied 2 items to be assigned to 7 items of")

# also works for assignment, as noted in #3690
DT[ , z_sum := base::sum(z), by = .(a, b)]
test(2063.6, DT[ , z_sum := base::sum(z), by = .(a, b)][1:3, z_sum],
c(1.8791864549242+0i, 3.17903639358309+0i, -4.18868631527035+0i))
test(2063.7, DT[1L, z_sum := 1i][1L, z_sum], 1i)

###################################
# Add new tests above this line #
Expand Down
14 changes: 14 additions & 0 deletions src/assign.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "data.table.h"
#include <Rdefines.h>
#include <Rmath.h>
#include <complex.h>

static void finalizer(SEXP p)
{
Expand Down Expand Up @@ -943,6 +944,15 @@ const char *memrecycle(SEXP target, SEXP where, int start, int len, SEXP source)
td[w-1] = sd[i&mask];
}
} break;
case CPLXSXP: {
double complex *td = (double complex *)COMPLEX(target);
const double complex *sd = (double complex *)COMPLEX(source);
for (int i=0; i<len; i++) {
const int w = wd[i];
if (w<1) continue;
td[w-1] = sd[i&mask];
}
} break;
case STRSXP : {
const SEXP *sd = STRING_PTR(source);
for (int i=0; i<len; i++) {
Expand Down Expand Up @@ -1002,6 +1012,10 @@ void writeNA(SEXP v, const int from, const int n)
for (int i=from; i<=to; ++i) vd[i] = NA_REAL;
}
} break;
case CPLXSXP: {
double complex *vd = (double complex *)COMPLEX(v);
for (int i=from; i<=to; ++i) vd[i] = NA_REAL + NA_REAL*I;
} break;
case STRSXP :
// character columns are initialized with blank string (""). So replace the all-"" with all-NA_character_
// Since "" and NA_character_ are global constants in R, it should be ok to not use SET_STRING_ELT here. But use it anyway for safety (revisit if proved slow)
Expand Down
6 changes: 6 additions & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,12 @@ typedef R_xlen_t RLEN;
#define ALTREP(x) 0 // for R<3.5.0, see issue #2866 and grep for "ALTREP" to see comments where it's used
#endif

// for complex type support; copied from r-source/src/main/Rcomplex.h
#if defined(__GNUC__) && (defined(__sun__) || defined(__hpux__) || defined(Win32))
# undef I
# define I (__extension__ 1.0iF)
#endif

// init.c
SEXP char_integer64;
SEXP char_ITime;
Expand Down
54 changes: 42 additions & 12 deletions src/dogroups.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
#include <Rdefines.h>
#include <fcntl.h>
#include <time.h>
#include <complex.h>

SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEXP xjiscols, SEXP grporder, SEXP order, SEXP starts, SEXP lens, SEXP jexp, SEXP env, SEXP lhs, SEXP newnames, SEXP on, SEXP verbose)
{
R_len_t rownum, ngrp, nrowgroups, njval=0, ngrpcols, ansloc=0, maxn, estn=-1, thisansloc, grpn, thislen, igrp, origIlen=0, origSDnrow=0;
int protecti=0;
SEXP names, names2, xknames, bynames, dtnames, ans=NULL, jval, thiscol, BY, N, I, GRP, iSD, xSD, rownames, s, RHS, listwrap, target, source, tmp;
SEXP names, names2, xknames, bynames, dtnames, ans=NULL, jval, thiscol, BY, N, DOTI, GRP, iSD, xSD, rownames, s, RHS, listwrap, target, source, tmp;
Rboolean wasvector, firstalloc=FALSE, NullWarnDone=FALSE;
clock_t tstart=0, tblock[10]={0}; int nblock[10]={0};

Expand Down Expand Up @@ -52,7 +53,8 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
for (R_len_t i=0; i<n; ++i) {
if (ilens[i] > maxGrpSize) maxGrpSize = ilens[i];
}
defineVar(install(".I"), I = PROTECT(allocVector(INTSXP, maxGrpSize)), env); protecti++;
// #3639 introduced complex.h, which defines I, so use DOTI to differentiate
defineVar(install(".I"), DOTI = PROTECT(allocVector(INTSXP, maxGrpSize)), env); protecti++;
R_LockBinding(install(".I"), env);

dtnames = PROTECT(getAttrib(dt, R_NamesSymbol)); protecti++; // added here to fix #4990 - `:=` did not issue recycling warning during "by"
Expand All @@ -76,7 +78,7 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
copyMostAttrib(VECTOR_ELT(dt,INTEGER(dtcols)[i]-1), VECTOR_ELT(SDall,i)); // not names, otherwise test 778 would fail
}

origIlen = length(I); // test 762 has length(I)==1 but nrow(SD)==0
origIlen = length(DOTI); // test 762 has length(DOTI)==1 but nrow(SD)==0
if (length(SDall)) origSDnrow = length(VECTOR_ELT(SDall, 0));

xknames = getAttrib(xSD, R_NamesSymbol);
Expand Down Expand Up @@ -143,6 +145,11 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
case REALSXP :
REAL(VECTOR_ELT(SDall,j))[0] = NA_REAL;
break;
case CPLXSXP :
MichaelChirico marked this conversation as resolved.
Show resolved Hide resolved
// no NA_COMPLEX; have to set r & i parts to NA_REAL individually
COMPLEX(VECTOR_ELT(SDall, j))[0].r = NA_REAL;
COMPLEX(VECTOR_ELT(SDall, j))[0].i = NA_REAL;
break;
case STRSXP :
SET_STRING_ELT(VECTOR_ELT(SDall,j),0,NA_STRING);
break;
Expand All @@ -154,8 +161,8 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
}
}
grpn = 1; // it may not be 1 e.g. test 722. TODO: revisit.
SETLENGTH(I, grpn);
INTEGER(I)[0] = 0;
SETLENGTH(DOTI, grpn);
INTEGER(DOTI)[0] = 0;
for (int j=0; j<length(xSD); ++j) {
switch (TYPEOF(VECTOR_ELT(xSD, j))) {
case LGLSXP :
Expand All @@ -167,6 +174,10 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
case REALSXP :
REAL(VECTOR_ELT(xSD,j))[0] = NA_REAL;
break;
case CPLXSXP :
COMPLEX(VECTOR_ELT(xSD, j))[0].r = NA_REAL;
COMPLEX(VECTOR_ELT(xSD, j))[0].i = NA_REAL;
break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is coming from a join, so we won't be able to cover it yet.

Minimal example to trigger the INTSXP branch of this switch (adapted test 1283.3):

dt1 <- data.table(a=rep(1:2, each=2), c=c(7L, 2L, 4L, 8L))
dt2 <- data.table(b=rep(2:3), c=c(2L, 6L), d=c(20L, 8L))
dt1[dt2, on = c(a = 'b'), .(a=a), by=.EACHI]

I believe it's because j uses a column from the LHS of on.

But joins aren't available for complex yet...

case STRSXP :
SET_STRING_ELT(VECTOR_ELT(xSD,j),0,NA_STRING);
break;
Expand All @@ -179,8 +190,8 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
}
} else {
if (LOGICAL(verbose)[0]) tstart = clock();
SETLENGTH(I, grpn);
int *iI = INTEGER(I);
SETLENGTH(DOTI, grpn);
int *iI = INTEGER(DOTI);
if (LENGTH(order)==0) {
if (grpn) rownum = istarts[i]-1; else rownum = -1; // not ternary to pass strict-barrier
for (int j=0; j<grpn; ++j) iI[j] = rownum+j+1;
Expand Down Expand Up @@ -214,13 +225,21 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
rownum = iI[k]-1;
td[k] = sd[rownum]; // on 32bit copies pointers too
}
} else { // size 8
} else if (size==8) {
double *td = REAL(target);
const double *sd = REAL(source);
for (int k=0; k<grpn; ++k) {
rownum = iI[k]-1;
td[k] = sd[rownum]; // on 64bit copies pointers too
}
} else { // size 16
// #3634 -- CPLXSXP columns have size 16
double complex *td = (double complex *)COMPLEX(target);
const double complex *sd = (double complex *)COMPLEX(source);
for (int k=0; k<grpn; ++k) {
rownum = iI[k]-1;
td[k] = sd[rownum];
}
}
}
if (LOGICAL(verbose)[0]) { tblock[1] += clock()-tstart; nblock[1]++; }
Expand Down Expand Up @@ -302,7 +321,7 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
}
memrecycle(target, order, INTEGER(starts)[i]-1, grpn, RHS); // length mismatch checked above for all jval columns before starting to add any new columns
copyMostAttrib(RHS, target); // not names, otherwise test 778 would fail.
/* OLD FIX: commented now. The fix below resulted in segfault on factor columns because I dint set the "levels"
/* OLD FIX: commented now. The fix below resulted in segfault on factor columns because I didn't set the "levels"
Instead of fixing that, I just removed setting class if it's factor. Not appropriate fix.
Correct fix of copying all attributes (except names) added above. Now, everything should be alright.
Test 1144 (#5104) will provide the right output now. Modified accordingly.
Expand Down Expand Up @@ -382,14 +401,20 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
for (int j=0; j<ngrpcols; ++j) {
target = VECTOR_ELT(ans,j);
source = VECTOR_ELT(groups, INTEGER(grpcols)[j]-1); // target and source the same type by construction above
if (SIZEOF(target)==4) {
int tsize = SIZEOF(target);
if (tsize==4) {
int *td = INTEGER(target);
int *sd = INTEGER(source);
for (int r=0; r<maxn; ++r) td[ansloc+r] = sd[igrp];
} else {
} else if (tsize==8) {
double *td = REAL(target);
double *sd = REAL(source);
for (int r=0; r<maxn; ++r) td[ansloc+r] = sd[igrp];
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is in the ngrpcols loop, so covering it requires being able to group by complex, which we can't (yet)

// #3634 -- CPLXSXP columns have size 16
double complex *td = (double complex *)COMPLEX(target);
double complex *sd = (double complex *)COMPLEX(source);
for (int r=0; r<maxn; ++r) td[ansloc+r] = sd[igrp];
MichaelChirico marked this conversation as resolved.
Show resolved Hide resolved
}
// Shouldn't need SET_* to age objects here since groups, TO DO revisit.
}
MichaelChirico marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -415,6 +440,11 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
double *td = REAL(target)+thisansloc;
for (int r=0; r<maxn; ++r) td[r] = NA_REAL;
} break;
case CPLXSXP : {
double complex *td = (double complex *)(COMPLEX(target) + thisansloc);
for (int r=0; r<maxn; ++r) { td[r] = NA_REAL + NA_REAL*I; }
MichaelChirico marked this conversation as resolved.
Show resolved Hide resolved
//for (int r=0; r<,maxn; ++r) { creal(td[r]) = NA_REAL; cimag(td[r]) = NA_REAL; }
} break;
case STRSXP :
for (int r=0; r<maxn; ++r) SET_STRING_ELT(target,thisansloc+r,NA_STRING);
break;
Expand Down Expand Up @@ -452,7 +482,7 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
} else ans = R_NilValue;
// Now reset length of .SD columns and .I to length of largest group, otherwise leak if the last group is smaller (often is).
for (int j=0; j<length(SDall); ++j) SETLENGTH(VECTOR_ELT(SDall,j), origSDnrow);
SETLENGTH(I, origIlen);
SETLENGTH(DOTI, origIlen);
if (LOGICAL(verbose)[0]) {
if (nblock[0] && nblock[1]) error("Internal error: block 0 [%d] and block 1 [%d] have both run", nblock[0], nblock[1]); // # nocov
int w = nblock[1]>0;
Expand Down