-
-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement push_back and push_front for DataFrame. #1099
Conversation
This does not enforce that the vectors are the same length (existing package doesn't either). |
The new version uses |
I still feel this is a bit inelegant as it only solves the problem for data frames, but I guess it is also class-specific how attributes should change when they are combined, so it might be better that any class that has a problem implements the appropriate fix like this. I'm not sure if any other classes suffer from this problem. |
How does anybody else feel about changing the names of these verbs? |
|
My thoughts:
I agree it'd be nice to fix this in general, but it seems hard to find the right solution that would work "in general" so it's okay to keep things scoped more narrowly here for now. |
Happy to be majority-vote overruled but I still feel quite strongly that |
My main concern is that If this were a clean-room rebuild of the class, I would definitely be on board though. |
I agreee with that point @kevinushey. That is indeed more important. Having the aliases is then a good compromise. |
Thanks for your patience. Rcpp 1.0.5 is now on CRAN, I am currently looking into #1098 and will turn to this afterwards. Likely in a day or two. |
This explicitly defines push_back and push_back methods for DataFrame which copies row.names and class attributes from the original object. This fixes #1094 where these operations converted the object to a list. Tests for this behavior are also implemented.
Make use of the fact set__() already does a conversion, and just use that. This has the benefit that it works with push_back(DataFrame), as well as push_back(Vector), and also handles adding vectors of different lengths properly.
I rebased this relative to the updated main branch and pushed the update. I have a local (not-yet committed) change adding ChangeLog and setting this as 1.0.5.2 and am now running reverse depends. We will know more by tomorrow this time. |
Hi @waltersom and apologies for taking a moment---but I first wanted to get 1.0.5 out of the door, and then test the PR by @kevinushey which came in just before yours. I have now done a full run, and in this case the full test paid us back. Package lidR and metacart fail under your PR. Both show a message of 'Could not convert using R function: as.data.frame.'. Could you take another look at the PR? Many thanks! |
No worries about the time, it's good to get it right, and there's no great rush. I'll have a look at why they might get that error. |
Ok, I have some minimum examples which exhibit these problems; library(Rcpp)
f <- cppFunction('DataFrame d(){
DataFrame d = DataFrame::create();
NumericVector v1(5);
NumericVector v2(5);
DataFrame d1 = DataFrame::create(_["a"]=v1, _["b"] = v2);
d.push_back(v1);
SETLENGTH(wrap(v1), 2);
d.push_back(v2);
return d;
}')
f()
#> Error in f(): Could not convert using R function: as.data.frame.
g <- cppFunction('DataFrame d(){
DataFrame d = DataFrame::create();
NumericVector v1(5);
NumericVector v2(6);
d.push_back(v1);
d.push_back(v2);
return d;
}')
g()
#> Error in g(): Could not convert using R function: as.data.frame. Created on 2020-07-09 by the reprex package (v0.3.0) Using the pre-PR code, I get library(Rcpp)
f <- cppFunction('DataFrame d(){
DataFrame d = DataFrame::create();
NumericVector v1(5);
NumericVector v2(5);
DataFrame d1 = DataFrame::create(_["a"]=v1, _["b"] = v2);
d.push_back(v1);
SETLENGTH(wrap(v1), 2);
d.push_back(v2);
return d;
}')
f()
#> [[1]]
#> [1] 0 0
#>
#> [[2]]
#> [1] 0 0 0 0 0
as.data.frame(f())
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 5
g <- cppFunction('DataFrame d(){
DataFrame d = DataFrame::create();
NumericVector v1(5);
NumericVector v2(6);
d.push_back(v1);
d.push_back(v2);
return d;
}')
g()
#> [[1]]
#> [1] 0 0 0 0 0
#>
#> [[2]]
#> [1] 0 0 0 0 0 0
as.data.frame(g())
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 5, 6 Created on 2020-07-09 by the reprex package (v0.3.0) My feeling is that both packages are constructing invalid dataframes, and probably getting them fixed would be good. The fix for both is easy, and just involves changing a DataFrame to a List (which is essentially what it was before, after a push_back). |
That is possible but we would probably have to find their code [and as I type this you post it, will study.] |
@eddelbuettel I have a fix for |
Do you have a handle on why their length differ between columns? That seems unlikely to be on purpose. [ And I don't recall from the top of my head whether we "help" by recycling, my instinct is to say we do not. ] |
The number of unique elements in different columns in x1 may be different, which leads to attempting to add vectors of different length into res. Previously, `push_back` on a DataFrame changed the DataFrame to a list, which is being addressed in RcppCore/Rcpp#1099. This change, combined with attempting to add vectors of different lengths, leads to an error.
My fixes are at https://github.com/waltersom/lidR/tree/replace-dataframe-with-list and https://github.com/waltersom/metacart/tree/replace-dataframe-list. For For Our |
Nice work. I can't look at it end -- end of day / evening here. There usual process is to propose a change to the affected package to let it build with the existing (i.e. 1.0.5) and proposed Rcpp (where it currently only works with existing). That is usually done by email to them, politely asking for a fix in two weeks or so. Once done we can change Rcpp itself. You've done so much already so let me be outrageous and ask directly: Would you be able to take care of preparing the patches and emails, i.e. drive all that from your end? |
I was already preparing a PR for lidR (r-lidar/lidR#358), and am about to look at metacart next. In both cases, the changes should work both with or without the change to Rcpp. It is only that once this (#1099) is merged, they break without the changes. |
You put that well to |
The number of unique elements in different columns in x1 may be different, which leads to attempting to add vectors of different length into res. Previously, `push_back` on a DataFrame changed the DataFrame to a list, which is being addressed in RcppCore/Rcpp#1099. This change, combined with attempting to add vectors of different lengths, leads to an error.
The number of unique elements in different columns in x1 may be different, which leads to attempting to add vectors of different length into res. Previously, `push_back` on a DataFrame changed the DataFrame to a list, which is being addressed in RcppCore/Rcpp#1099. This change, combined with attempting to add vectors of different lengths, leads to an error.
The number of unique elements in different columns in x1 may be different, which leads to attempting to add vectors of different length into res. Previously, `push_back` on a DataFrame changed the DataFrame to a list, which is being addressed in RcppCore/Rcpp#1099. This change, combined with attempting to add vectors of different lengths, leads to an error.
I have also emailed the metacart maintainer, so hopefully at some point
soon those two packages will be updated.
…On Thu, 9 Jul 2020 at 15:07, Dirk Eddelbuettel ***@***.***> wrote:
You put that well to lidR in its 357 and 358. The fact is that
Rcpp::DataFrame is *not* one of our strongest classes and I personally
have always grown lists---precisely what you propose here---because
DataFrame is half baked. Your PR will make it better so this kicks a few
cans down the right road!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1099 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7W2M5BEJR5HSGU5JGFS6DR2UX6DANCNFSM4ONCHN6Q>
.
|
One thing these packages failing highlights is that the error message when attempting to construct an invalid |
A Food for thought: maybe make it even more complex and add an option() to fall back to old behavior? Or do we think that is a bad idea, and that we should enforce the same length now? |
With this commit, if after a push_back the row count of the DataFrame would be invalid, a warning is generated. The type is then not coerced to a DataFrame, but left as a list. In future, this could instead `stop()`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Edit: Whoops, mean to comment/approve the last commit only. Oh well ;-)
I have implemented a function But, thinking about it, let me check what happens if one of the columns has zero length... |
Actually, using |
I guess you noticed
|
ChangeLog
Outdated
@@ -17,6 +17,13 @@ | |||
* inst/NEWS.Rd: Idem | |||
* vignettes/rmd/Rcpp.bib: Idem | |||
* inst/bib/Rcpp.bib: Idem | |||
2020-07-01 Walter Somerville <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
- in chronological order
- with proper empty lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just sent a trivial commit back to you for that so make sure you pull to your box.
This triggers the warning when adding a zero-length object to a DataFrame. It is not a valid DataFrame, so calls `warning()`, and doesn't call `as.data.frame()`.
Ok, I have corrected that failing test. Thanks for doing the changelog. |
lidR maintainer here. The fix has been merged in Please keep me updated when you plan to release this change. I won't release a new version of lidR right now because I already made several releases recently. When you will be ready for a release of |
Salut Jean-Romain It works the other way around. We cannot upload until you upload as our now-stricter-in-this-PR code breaks your package, and CRAN will not admit "breaking" packages. We are not in a great hurry as we just released, and but the point also is to not have this linger. Could you update within, say, two weeks which is the common timeframe CRAN itself often gives? |
I know it works the other way. This is why I suggested to tell me when you plan to release Rcpp so I can release my package first. I believed that the common time-frame given by the CRAN was more like 1 month but I can release in two weeks if mandatory. |
Rcpp will likely not have a release for many months [1], but that is no reason to push this off "forever". I apologize if we were unclear, we did not mean to put a gun to your head and scream "update NOW" but as we depend upon your cooperation here it would help us if you did it "sooner rather than later". Again, it does obviously not have to be this week. But everybody is busy and it helps to commit to a) action and b) a date. So if you could say "sure, but this is summer so give me three weeks" it would help us too. [1] But planning is one thing. The possible need for a hot-fix bug release is always a possibility too. |
No problem. It will definitively be done within a month maximum and I won't delay that forever. |
@waltersom the last commits of yours and mine got out of sync so you needed a merge -- I just cleaned that up with squash rebase and force-push. To catch you would need to reset your end a few commits and then pull. |
@Jean-Romain Thanks. If and when you do please let us know here so that we can proceed too. |
I have just heard from the maintainer of metacart that he has submitted a
new version to CRAN that won't fail with our changes.
…On Fri, Jul 10, 2020, 02:25 Dirk Eddelbuettel ***@***.***> wrote:
@Jean-Romain <https://github.com/Jean-Romain> Thanks. If and when you do
please let us know here so that we can proceed too.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1099 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7W2M6FOHZEOGTMK4RZVMLR2XHODANCNFSM4ONCHN6Q>
.
|
Just tested metacart 2.0-3, already on CRAN, and as expected it passes. Nice work! |
Hi @Jean-Romain The last |
The new version is now on CRAN. |
@Jean-Romain Thank you! I will proceed with Rcpp tests and changes then clearing the way for this PR. |
Thanks everybody for their patience, of course @waltersom for writing the PR in the first place, and to @jclaramunt for very promptly updating Reverse depends checks revealed no issue so this now goes into the main branch. Yay!!! And again, thanks to all for help! |
Thanks again for everybody's patience. This is now 'live' in Rcpp 1.0.6 which just got to CRAN under our bi-annual release cycle. |
This explicitly defines push_back and push_back methods for DataFrame
which copies row.names and class attributes from the original object.
This fixes #1094 where these operations converted the object to a list.
Tests for this behavior are also implemented.
Checklist
R CMD check
still passes all tests