Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #3639 -- proper handling of CPLXSXP in dogroups #3689

Merged
merged 15 commits into from
Jul 16, 2019
Merged

Conversation

MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Jul 10, 2019

#3639

Haven't quite figured out the ins and outs of CPLXSXP and COMPLEX() usage yet but I think this gets at the kernel of the issue -- dogroups assumes SIZEOF is either 4 or 8, not so for CPLXSXP (16). Will come back to this.

FWIW grep -Er "==\s?4" src only turns up reorder and dogroups, so I think this and #3688 should cover this class of issue.

@MichaelChirico MichaelChirico force-pushed the cplx_dogroups branch 2 times, most recently from fd4b465 to c17bb81 Compare July 10, 2019 06:40
remove debugging helpers

remove debugging helpers

remove debugging helpers
@codecov
Copy link

codecov bot commented Jul 10, 2019

Codecov Report

Merging #3689 into master will decrease coverage by 0.02%.
The diff coverage is 85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3689      +/-   ##
==========================================
- Coverage   98.25%   98.23%   -0.03%     
==========================================
  Files          69       69              
  Lines       13088    13099      +11     
==========================================
+ Hits        12860    12868       +8     
- Misses        228      231       +3
Impacted Files Coverage Δ
R/data.table.R 97.87% <ø> (ø) ⬆️
src/dogroups.c 92.42% <85%> (-0.68%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e3868b...0288c5f. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 10, 2019

Codecov Report

Merging #3689 into master will decrease coverage by 0.01%.
The diff coverage is 83.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3689      +/-   ##
==========================================
- Coverage   98.25%   98.24%   -0.02%     
==========================================
  Files          69       69              
  Lines       13094    13122      +28     
==========================================
+ Hits        12866    12892      +26     
- Misses        228      230       +2
Impacted Files Coverage Δ
R/data.table.R 97.87% <ø> (ø) ⬆️
src/subset.c 100% <ø> (ø) ⬆️
src/init.c 100% <100%> (ø) ⬆️
src/assign.c 100% <100%> (ø) ⬆️
src/dogroups.c 92.85% <73.68%> (-0.25%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3e146d...5258421. Read the comment docs.

src/dogroups.c Outdated Show resolved Hide resolved
@MichaelChirico MichaelChirico changed the title WIP: initial take at #3639 -- proper handling of CPLXSXP in dogroups Closes #3639 -- proper handling of CPLXSXP in dogroups Jul 10, 2019
@MichaelChirico
Copy link
Member Author

The code is working, just need to improve coverage (any ideas?)

src/dogroups.c Outdated Show resolved Hide resolved
double *td = REAL(target);
double *sd = REAL(source);
for (int r=0; r<maxn; ++r) td[ansloc+r] = sd[igrp];
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is in the ngrpcols loop, so covering it requires being able to group by complex, which we can't (yet)

src/dogroups.c Outdated
case CPLXSXP :
COMPLEX(VECTOR_ELT(xSD, j))[0].r = NA_REAL;
COMPLEX(VECTOR_ELT(xSD, j))[0].i = NA_REAL;
break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is coming from a join, so we won't be able to cover it yet.

Minimal example to trigger the INTSXP branch of this switch (adapted test 1283.3):

dt1 <- data.table(a=rep(1:2, each=2), c=c(7L, 2L, 4L, 8L))
dt2 <- data.table(b=rep(2:3), c=c(2L, 6L), d=c(20L, 8L))
dt1[dt2, on = c(a = 'b'), .(a=a), by=.EACHI]

I believe it's because j uses a column from the LHS of on.

But joins aren't available for complex yet...

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Jul 11, 2019

OK. The remaining codecov problems I think are outside the scope of this PR (see comments). so I guess we should just wait to merge this one until after #3701, at which point we can add tests here.

Will also file a coverage commit for dogroups overall while I'm in here.

DT3 = DT3[sample(nrow(DT3))]
DT1[ , x_num := match(x, letters) + .1]
DT3[ , x_num := match(x, letters) + .1]
test(1540.36, DT1[DT3, .(x_num), by=.EACHI, on=c(x_num="x_num")])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strangely enough, I couldn't get this test to hit the LGLSXP branch:

DT1[ , x_lgl := x > = 'a']
DT3[ , x_lgl := x > 'a']
DT1[DT3, .(x_lgl), by=.EACHI, on=c(x_lgl="x_lgl")]

doesn't do it...

case CPLXSXP :
COMPLEX(VECTOR_ELT(xSD, j))[0].r = NA_REAL;
COMPLEX(VECTOR_ELT(xSD, j))[0].i = NA_REAL;
break;
case STRSXP :
SET_STRING_ELT(VECTOR_ELT(xSD,j),0,NA_STRING);
break;
case VECSXP :
SET_VECTOR_ELT(VECTOR_ELT(xSD,j),0,R_NilValue);
break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we nocov this branch (or set it to error even?)

IIUC hitting this branch would require first joining on VECSXP column in on, I don't think we have any plan to support list-to-list joins?

Copy link
Member

@mattdowle mattdowle Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't no cov it otherwise it'll drop off our todo list. I'd say don't error either because something somewhere might be using it. I'd say just leave it as-is for now until it can be covered at a later date.

src/assign.c Outdated
@@ -1002,6 +1011,11 @@ void writeNA(SEXP v, const int from, const int n)
for (int i=from; i<=to; ++i) vd[i] = NA_REAL;
}
} break;
case CPLXSXP: {
Rcomplex *vd = COMPLEX(v);
Rcomplex NA_CPLX = {NA_REAL, NA_REAL};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit strange why we have to define this ourselves but I see it done like this in a few places in r-source (e.g. subset.c)

Copy link
Member

@mattdowle mattdowle Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I see 3 in src/main/subset.c too. Those are the only occurence of the string "NA_C" (to catch both NA_COMPLEX, NA_CPLX, and anything similar starting with C for complex) in the entire source tree of R. So since it's so rarely used, I guess it just hasn't been moved into a header. Which seems reasonable. More to the point, because it needs to be a variable, it can't be plain compiled constant like the other NA_* definitions.

@mattdowle
Copy link
Member

Have we heard from many users requesting complex support?
I'll try and get through these PRs asap, and hopefully they are quick, but it's adding complexity (pun intended) when another approach could be just to not support complex yet until other higher value things are done first.
If the idea is to pick off something where you can learn the code base, that's a good idea, and I'm ok to merge them in on that basis. Just hoping these 6 PRs are quick and they don't cause unintended effects.

Some links to users and fields requesting complex support please, would keep me happy that it's worth doing.

@eliocamp
Copy link
Contributor

I'm one happy data.table user that uses complex numbers frequently and would be even happier if complex support is enhanced 😄. Perusing github I see there are some issues addressing it:

Not a lot (understandably), but non-zero.

@mattdowle mattdowle added this to the 1.12.4 milestone Jul 16, 2019
@mattdowle
Copy link
Member

mattdowle commented Jul 16, 2019

@eliocamp Great, thanks. I looked through them. It looks to me as if it's programmers checking that all types are supported for completeness reasons, rather than users in the real-world actually needing complex number support. I did find one field mentioned in those issues though :

  • signal processing, especially for the analysis of power spectra.

What's your field that you use complex numbers in?

@eliocamp
Copy link
Contributor

Meteorology, but I use some signal processing methods to analyse fields. Complex Principal Component Analysis, for example.

@mattdowle
Copy link
Member

Good to know! Are there any articles that show how meteorology, signal processing and/or complex-PCA need complex numbers in the context of data.table-like operations like grouping? I did complex number theory in undergraduate applied math and much enjoyed it, but I've never used complex numbers since then.

@mattdowle
Copy link
Member

All looks good. I realize the diff coverage is under target but that's ok, because as Michael wrote, other PRs will cover those in future.

@mattdowle mattdowle merged commit 714e84f into master Jul 16, 2019
@mattdowle mattdowle deleted the cplx_dogroups branch July 16, 2019 22:37
@MichaelChirico
Copy link
Member Author

pick off something where you can learn the code base

in fact that's the main thinking. I imagine there are some use cases around Fourier series running around but yea, I haven't come across it myself in a while either

@@ -77,7 +77,6 @@ static void subsetVectorRaw(SEXP ans, SEXP source, SEXP idx, const bool anyNA)
case CPLXSXP : {
Rcomplex *sp = COMPLEX(source);
Rcomplex *ap = COMPLEX(ans);
Rcomplex NA_CPLX = { NA_REAL, NA_REAL };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at it again, would just inlining = (Rcomplex){NA_REAL, NA_REAL} do the trick?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think (Rcomplex){NA_REAL, NA_REAL} will work. Because Rcomplex is a struct. The {NA_REAL, NA_REAL} is special syntax just for initialization of a variable/struct.

@eliocamp
Copy link
Contributor

Good to know! Are there any articles that show how meteorology, signal processing and/or complex-PCA need complex numbers in the context of data.table-like operations like grouping? I did complex number theory in undergraduate applied math and much enjoyed it, but I've never used complex numbers since then.

I've never read anything specific about data.table. Here's a paper about the method: https://journals.ametsoc.org/doi/abs/10.1175/1520-0450%281984%29023%3C1660%3ACPCATA%3E2.0.CO%3B2

This is the code I'm using now. It gets the Hilbert transform of the data and then does an EOF (Empirical Orthogonal Functions --what we meteorologists call PCA) for each season, dataset and height level. Because of #3639 I had to use the workaround of separating the real an imaginary parts and then joining them together after the grouping was done.

ceof <-  data %>%
  .[, c("R", "I") := ReIm(spectral::analyticFunction(hgt.minus)),
    by = .(lat, season, year, dataset, lev)] %>% 
  .[, {
    d <- copy(.SD)[, hgt := R + I*1i]
    list(ceof = list(
      EOF(hgt ~ year | lon + lat, n = 1:2, 
          B = 5000, suffix = "cPC", data = d)
    ))
  }, by = .(dataset, season, lev)]

EOF() (a function I created and available in the metR package) returns a list of data.tables that are the matrices of the Singular Value Decomposition in "long" format. So ceof is a data.table with a column names "ceof" (yeah, not very smart naming convention) that that holds said list of data.tables, one for each combination of dataset, season and height level.

@mattdowle
Copy link
Member

Thanks @eliocamp, helps a lot! Continuing here: #3701 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants