-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
columns selection helper for .SDcols, j expr #4174
Conversation
Since It would be good to bring
Names that are within the data.table
Quoted names that are within the data.table
numbers
character vector char_cols = c("V1", "V3")
character vector V1 = c("V1", "V3")
int_cols = c(1,3)
lgl_cols = c(TRUE, TRUE, FALSE, FALSE, FALSE)
lgl_cols = c(TRUE, FALSE)
patterns
function
calls; char_cols = c("V1", "V3")
|
@ColeMiller1 yes, tables are useful, good roadmap for unit tests. |
*/ | ||
SEXP replace_dot_alias(SEXP x) { | ||
if (isLanguage(x) && !isFunction(CAR(x))) { | ||
if (CAR(x) == sym_bquote) // handling `.` inside bquote, #1912 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice (though a bit surprising) that ==
works here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe because it is not STRSXP but CHARSXP
just documenting current behaviour of peeling from > x[, c("V1","V3")]
V1 V3
<int> <int>
1: 1 3
> x[, ((c("V1","V3")))]
[1] "V1" "V3"
> x[, .SD, .SDcols=c("V1","V3")]
V1 V3
<int> <int>
1: 1 3
> x[, .SD, .SDcols=((c("V1","V3")))]
V1 V3
<int> <int>
1: 1 3 I think it make sense to keep it like that so extra |
R -q -e 'cc("colselect.Rraw")' can be used for spotting regression when doing changes for addressing new tests |
Current behaviour of handling peeling from
Just documenting it here. |
R/data.table.R
Outdated
@@ -213,8 +203,12 @@ replace_dot_alias = function(e) { | |||
av = NULL | |||
jsub = NULL | |||
if (!missing(j)) { | |||
colselect = getOption("datatable.colselect", FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long would this be an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will write more on that when PR will be more mature. It has to be an option unfortunately, there are so many corner cases than it is very unlikely to address them maintaining backward compatibility.
are being used. That was changed recently for consistency to data.frame methods. | ||
\item{by}{\code{character}, \code{integer} or \code{logical} vector indicating | ||
which combinations of columns from \code{x} to use for uniqueness checks. By | ||
default all columns are being used. That was changed recently for consistency to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we can either remove the last sentence or at least "recently" since it's from >3 years ago:
} | ||
e | ||
} | ||
replace_dot_alias = function(e) .Call(Creplace_dot_aliasR, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may as well move this to wrappers.R
This PR attempts to implement C helper function to investigate cc("colselect.Rraw")
options(datatable.colselect=TRUE); cc("tests.Rraw") We could isolate the logic into R helper function instead. Then it will be definitely less complex. On the other hand C implementation gives a fine-grained definition of column selection logic, which is very valuable. Keep in mind that speed is not a concern as we don't operate on data. |
Was the .SDcols mode far enough along? If so, I can work on pushing that portion of the code for C. Otherwise, I will push an R solution - it's a lot easier for me to test and develop. P.S., All the tests and and work you did was super impressive. |
peeled = true; | ||
} | ||
SEXP sym_brackets = install("{"); | ||
while (isLanguage(expr) && CAR(expr)==sym_brackets && length(expr)==2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jangorecki Is isLanguage
equivalent to is.call()
? Also, why do the brackets need the length(expr) == 2)
? Is that to only peel one layer? Also, I'm not sure for etiquette on closed PRs or if you even receive notifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's the source for is.call
https://github.com/wch/r-source/blob/trunk/src/main/coerce.c#L2072-L2073
case 300: /* is.call */
LOGICAL0(ans)[0] = TYPEOF(CAR(args)) == LANGSXP;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost identical to isLanguage
:
https://github.com/wch/r-source/blob/trunk/src/include/Rinlinedfuns.h#L903-L906
INLINE_FUN Rboolean isLanguage(SEXP s)
{
return (s == R_NilValue || TYPEOF(s) == LANGSXP);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it appears that for how we set up our functions, TYPEOF(s) == LANGSXP
would be equivalent of is.call()
since I believe args
is normally a list for the C arguments. I am asking because I am interested in a more direct %iscall%
. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only {
needs length check because length(quote({1; 2}))
should not be peeled. (
will be always length 2.
SET_TAG(CDDDR(lapply), install("pos")); | ||
SEXP env = eval(lapply, rho); | ||
SEXP evaleval = lang3(install("substitute"), expr, env); | ||
SEXP cols = eval(eval(evaleval, rho), rho); // huh... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SEXP cols = eval(eval(evaleval, rho), rho); // huh...
+1.
So, I understand we need to replace c(..cols1, ..cols2)
, did you get as far as c(..cols1, lapply(.SD, sum))
? I see a little farther you reference the clash between the two modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so it did worked like this already
In general towards more modular
[.data.table
#852 and step by step push-down it to C.In particular this PR should address #4115 (#2178), #4004, #2069, #2059 and probably some others as well... #4231, #4235
Still very much WIP, pushing now because @ColeMiller1 is also developing around those issues so good to not duplicate the same work.
Note that I will be offline this week, so take your time giving feedback.
There is one incosnsistency that we have to work out:
j
now supports column selection by single symbol, unquoted column name..SDcols
now supports columns selection by single symbol, function name.Those cannot be easily combined, if a data.table will have a column of the same name as the function that user is trying to use then the function will be ignored (eventually
..
prefix will help here). The tricky part is that to select column by symbol we do not want to evaluate symbol, but to check if the symbol is a function we have to evaluate it. Lets leave it for now. Once PR will be more mature I will come back to that issue with illustrative example.