-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing crash when attempting to join on character(0) #4272
Conversation
@@ -21,8 +21,8 @@ merge.data.table = function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FAL | |||
if (!missing(by) && !missing(by.x)) | |||
warning("Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.") | |||
if (!is.null(by.x)) { | |||
if ( !is.character(by.x) || !is.character(by.y)) | |||
stop("A non-empty vector of column names are required for `by.x` and `by.y`.") | |||
if (length(by.x) == 0L || !is.character(by.x) || !is.character(by.y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha! I was just looking at this code yesterday and something looked funny but I didn't bother stress testing it. nice catch!
@@ -3031,7 +3031,7 @@ isReallyReal = function(x) { | |||
onsub = as.call(c(quote(c), onsub)) | |||
} | |||
on = eval(onsub, parent.frame(2L), parent.frame(2L)) | |||
if (!is.character(on)) | |||
if (length(on) == 0L || !is.character(on)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, perfect. we also shouldn't have gotten to checking by.x&by.y separately in the first place because here by.x=by.y so simply by should be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean. At this point we're not checking separately if we come through merge. merge sets by=by.x and then later calls y[x, on=by]. If we don't check in merge we catch it here but this is the point where it gets caught when using x[y] syntax.
(I would've been really mad if you had pushed a fix yesterday.)
Codecov Report
@@ Coverage Diff @@
## master #4272 +/- ##
=======================================
Coverage 99.60% 99.60%
=======================================
Files 73 73
Lines 14027 14029 +2
=======================================
+ Hits 13972 13974 +2
Misses 55 55
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please fix that in bmerge.c? I just run into that problem using internal functions. Segfaults are pretty severe issues that should be eliminated, not only from exported API, but in general.
I'll have a look at it but it may take a bit for me to get the chance to actually write and test the fix. But it should just be the same length check only in the C function and then raise an internal error. I assume you're calling bmerge directly? |
Yes, somewhere around
in SEXP bmerge, to check those are non-zero length, should do. If you remove your current fixes, then you can easily reach there with your unit tests. Which will be probably good, to handle that in single place. |
Now also closes #4499. I opted to not raise an error there in order to pass 2126.1 and 2126.2/be consistent with the behavior expected there. I do think there is an argument to be made for all those cases to be an error or for joins with empty data.tables to return an empty data.table. The current behavior is close-ish though. I also think it's better to leave the argument checks for joins with |
Thanks for incorporating my feedback. It should be safe to put it into coming release. |
Thanks @tlapak! I've invited you to be project member, please accept using the button that should appear on your GitHub projects or profile page. That way in future you can create branches in the main project directly. I'll add you to contributors list as well in a follow up commit (easier for me than pushing to your fork). |
Attempting to join or merge on
character(0)
currently crashes R in two out of three possible cases. At least on Windows:Turns out that merge checks the length of
by
but does not check the length ofby.x
orby.y
(either is sufficient as the equality is checked). Likewise,[.data.table
, or rather.parse_on
, doesn't check the length ofon
. I have added the checks as well as tests for all three cases.(Actually, only checking in
.parse_on
would be sufficient to prevent the crash, but this way produces a more useful error message when usingmerge
.)I have also taken the liberty of making a grammar fix to the relevant error message of
merge
, hope that is acceptable.Now also closes #4499