-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add collect_hub function #18
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18 +/- ##
==========================================
+ Coverage 86.98% 87.23% +0.25%
==========================================
Files 9 10 +1
Lines 676 705 +29
==========================================
+ Hits 588 615 +27
- Misses 88 90 +2 ☔ View full report in Codecov by Sentry. |
The macos latest build is failing because of the temporary issue described in #15 . Not sure what to do about it. I could fix it by installing from the Apache R Universe version in the workflow. That's not where most users would do though so, until we swapped it back to CRAN, we Happy to hear people's thoughts. |
…ent/handle-null-taskids Replace all null task id properties with required = NA
Leaving this here for other reviewers who might want to fetch this feature branch and give it a spin locally. Where dplyr::collect() 😢
hub_collect() 😀
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @annakrystalli. Glad we're trying to handle the hard bits on behalf of users!
One or two inline notes, but nothing that would prevent rolling out this improvement.
@@ -5,6 +5,8 @@ on: | |||
branches: [main, master] | |||
pull_request: | |||
branches: [main, master] | |||
schedule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Would love to get to a place where these kinds of small operational changes can be in separate PRs so we can get 'em merged in without waiting for review of new features.
vignettes/articles/connect_hub.Rmd
Outdated
@@ -54,12 +54,26 @@ hub_con | |||
|
|||
To access data from a hub connection you can use dplyr verbs and construct querying pipelines. | |||
|
|||
You can use `dplyr`'s `collect()` function: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have collect_hub()
, is there any reason someone would want to use dplyr collect()?
As someone with less R proficiency than many folks on the team, I'm left wondering what to do when presented with multiple options like this. Is it worth recommending a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect_hub
is mainly a wrapper around dplyr::collect()
(a very well known tidyverse function) with some extras. It depends on what they are doing with their data next but there is no reason they must use collect_hub()
, it just conveniently outputs a model_out_tbl
which many downstream hubverse package functions expect.
I've refactored the article a bit to bring more attention to the benefits of collect_hub
and also used it in the connect_hub
examples but ultimately collect
will work just as well. It just might need an extra step to coerce data to model_out_tbl
if used in downstream hubverse
functionality
…mphasize key features of tools
This PR adds a
collect_hub()
function which wrapsdplyr::collect()
but also converts the output to amodel_out_tbl
class object by default where possible. The function also accepts additional arguments that can be passed toas_model_out_tbl()
.Th PR resolves:
connect_hub
, defaultarrow
install yieldsError: This build of the arrow package does not support Datasets
#15I've also modified the R CMD Check workflow to run nightly so that we can pick up any issues arising from upgrades in dependencies promptly. Let's test it out and I can roll it out to all our mature packages when ready.