-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial try on tar_download #9
Conversation
Should I not be using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @noamross! I really appreciate your help with this archetype, especially because you are one of the folks who is probably going to use it the most. I agree with most of what you sketched out. I think the issues in my comments and some new tests will get us most of the way there.
@@ -32,7 +32,8 @@ Suggests: | |||
digest (>= 0.6.25), | |||
knitr (>= 1.28), | |||
rmarkdown (>= 2.1), | |||
testthat (>= 2.3.2) | |||
testthat (>= 2.3.2), | |||
curl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add a minimum version for curl
?
stop_on_no_internet = FALSE, | ||
... | ||
) { | ||
if(!requireNamespace("curl", quietly = TRUE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you use tarchetypes::assert_package("curl", "tar_download() requires the the package 'curl' to be installed")
?
stop("tar_download() requires the the package 'curl' to be installed") | ||
} | ||
handle <- handle %||% curl::new_handle() | ||
tar_change( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you pointed out, we should use tar_change_pair()
.
destdir = ".", | ||
handle = NULL, | ||
stop_on_no_internet = FALSE, | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consider making these arguments formal. For other archetypes, I found this useful so I could change the defaults in a way that users notice.
#' @export | ||
tar_download_file <- function(url, destfile, destdir, stop_on_no_internet = FALSE, handle = curl::new_handle()) | ||
{ | ||
if (!curl::has_internet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other archetypes might require an internet connection, so it may be nice to put this in a new assert_internet()
in R/utils_assert.R
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your assertion patterns don't currently allow for "warning only" options, which I want here. (The use-case being that in the absence of a connection the workflow should be able to continue with current versions of files.) Do you want to incorporate that into your assertion setup or should I just do something conditional here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. How about a validate_internet()
utility?
validate_internet <- function(assert_internet = FALSE) {
trn(assert_internet, try_cancel_internet(), assert_internet())
}
try_cancel_internet <- function(msg = NULL) {
if (!curl::has_internet()) {
warn_validate("no internet")
tar_cancel()
}
}
warn_validate <- function(...) {
warning(warning_validate(...))
}
warning_validate <- function(...) {
structure(
list(message = paste0(..., collapse = ""), call = NULL),
class = c(
"condition_validate",
"condition_tarchetypes",
"warning",
"condition"
)
)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to test for internet to determine whether to cancel the target. Should this return a logical value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly for me atm this seems like a way to turn 5 lines of readable code into 20 harder-to-understand lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated the code above to include a call to tar_cancel()
.
There are more functions, but they help us avoid nested if/else logic, and I think each function might be reusable for other archetypes. And warn_validate()
and warning_validate()
support custom conditions for warnings, which make exception handling and testing nicer.
stop("No internet. Cannot check url: ", url) | ||
else | ||
warning("No internet. Cannot check url: ", url) | ||
tar_cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we namespace calls to targets
functions?
@@ -0,0 +1,104 @@ | |||
#' @title Download a file from a remote source, checking for changes | |||
#' first | |||
#' @description Create a target that downnloads a file if it has changed since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just flagging this for proofreading when it comes time to polish things up.
Exactly, the NSE issues you faced are the exact reason for |
tar_download <- function( | ||
name, | ||
url, | ||
destfile = basename(url), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both destdir
and destfile
arguments? What about a single path
argument that defaults to basename(url)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I like this split, is that, for me, it's a common pattern to download a lot of files from different sources to a data/
or downloads/
(or data/downloads
), so this makes an easy pattern to iterate over and conceptually I separate where I put something from it's name. I don't feel strongly about it but a lot of functions split these concepts (e.g., rmarkdown::render).
destfile = basename(url), | ||
destdir = ".", | ||
handle = NULL, | ||
stop_on_no_internet = FALSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe assert_internet
instead of stop_on_no_internet
?
Also, please feel free to add yourself as a contributor in the |
On reflection, I think we might be better off thinking about input URLs as formats: ropensci/targets#154. |
Though if we do move to a format instead of an archetype, I'm not sure what to do about the |
Yup, |
@noamross, after some reflection and refactoring, I decided to allow custom tar_target(
url,
"https://httpbin.org/etag/test",
format = "url",
resources = list(handle = curl::new_handle(...))
) So I think the functionality in the current PR is now covered directly in I built URLs directly into list(
tar_files(urls, rep("https://httpbin.org/etag/test", 2), format = "url"),
tar_target(downloads, download.file(urls), pattern = map(urls))
) Combine with some HPC, this could make parallel downloads a lot easier. |
Sorry, I didn't mean to close your PR @noamross. I don't know why that happened, and I don't know why I can't reopen it. |
I definitely don't remember closing it, especially not within the last few days, and I 100% meant to keep it open because of the alternative direction available to |
Just to say I would appreciate having this in {tarchetypes} - I almost started implementing my own before discovering this. By now this is one of the few archetypes/helpers I miss and would use frequently, ideally in conjunction with the new |
Prework
Summary
This is an initial try at
tar_download()
, which builds on top oftar_change()
.tar_download()
creates a file target which will be downloaded if the remote URL's modified time or eTag is updated.This is a draft as I'm still working through a bug understanding some NSE issues, but I thought I would show progress so far for comment.
Related GitHub issues and pull requests
Checklist