-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial try on tar_download #9
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
#' @title Download a file from a remote source, checking for changes | ||
#' first | ||
#' @description Create a target that downnloads a file if it has changed since | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just flagging this for proofreading when it comes time to polish things up. |
||
#' the last attempt, based responds to a change | ||
#' in an arbitrary value. If the value changes, the target reruns. | ||
#' @details `tar_download()` creates a pair of targets, one upstream | ||
#' and one downstream, usingn `tar_change()`. always runs and returns | ||
#' header information (eTag and modified time, if available), to check | ||
#' if the remote value has changed. This header gets referenced in the | ||
#' downstream target, which causes the downstream target to download the | ||
#' if the header information changes. The downstream target has "file" format | ||
#' and stores the location of te downloaded file The behavior is cancelled if | ||
#' `cue` is `tar_cue(depend = FALSE)` or `tar_cue(mode = "never")`. | ||
#' @export | ||
#' @return A list of two targets, one upstream and one downstream. | ||
#' The upstream one checks url headers, and the downstream one | ||
#' downloads the file to it. See the examples for details. | ||
#' @param url The URL to fetch the file from | ||
#' @param destfile Whether to invoke tidy evaluation | ||
#' @param destdir the directory to download the file to | ||
#' @param stop_on_no_internet Whether to fail in the absence of an internet connection. | ||
#' Default is `FALSE` and only produces a warning | ||
#' @param handle a handle created by `curl::new_handle()` to use when fetching the file | ||
#' or checking headers. Useful for authentication. | ||
#' @param ... other parameters passed to `tar_change()` | ||
#' @examples | ||
#' \dontrun{ | ||
#' | ||
# targets::tar_dir({ | ||
# targets::tar_script({ | ||
# library(tarchetypes) | ||
# tar_pipeline( | ||
# tar_download(gitfile, url = "https://github.com/wlandau/tarchetypes/archive/master.zip") | ||
# ) | ||
# }) | ||
# targets::tar_make(callr_function = NULL) | ||
# }) | ||
#' } | ||
tar_download <- function( | ||
name, | ||
url, | ||
destfile = basename(url), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason I like this split, is that, for me, it's a common pattern to download a lot of files from different sources to a |
||
destdir = ".", | ||
handle = NULL, | ||
stop_on_no_internet = FALSE, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe |
||
... | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's consider making these arguments formal. For other archetypes, I found this useful so I could change the defaults in a way that users notice. |
||
) { | ||
if(!requireNamespace("curl", quietly = TRUE)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you use |
||
stop("tar_download() requires the the package 'curl' to be installed") | ||
} | ||
handle <- handle %||% curl::new_handle() | ||
tar_change( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As you pointed out, we should use |
||
name, | ||
command = substitute({ | ||
tar_download_file(url, destfile, destdir, stop_on_no_internet, handle) | ||
}, env = list(url = url, destfile = destfile, destdir = destdir, stop_on_no_internet = stop_on_no_internet, handle = handle)), | ||
change = substitute({ | ||
tar_url_hash(url, stop_on_no_internet, handle) | ||
}, env = list(url = url, stop_on_no_internet = stop_on_no_internet, handle = handle)), | ||
format = "file" | ||
... | ||
) | ||
} | ||
|
||
#' @rdname tar_download | ||
#' @export | ||
tar_download_file <- function(url, destfile, destdir, stop_on_no_internet = FALSE, handle = curl::new_handle()) | ||
{ | ||
if (!curl::has_internet()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other archetypes might require an internet connection, so it may be nice to put this in a new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your assertion patterns don't currently allow for "warning only" options, which I want here. (The use-case being that in the absence of a connection the workflow should be able to continue with current versions of files.) Do you want to incorporate that into your assertion setup or should I just do something conditional here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. How about a validate_internet <- function(assert_internet = FALSE) {
trn(assert_internet, try_cancel_internet(), assert_internet())
}
try_cancel_internet <- function(msg = NULL) {
if (!curl::has_internet()) {
warn_validate("no internet")
tar_cancel()
}
}
warn_validate <- function(...) {
warning(warning_validate(...))
}
warning_validate <- function(...) {
structure(
list(message = paste0(..., collapse = ""), call = NULL),
class = c(
"condition_validate",
"condition_tarchetypes",
"warning",
"condition"
)
)
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still need to test for internet to determine whether to cancel the target. Should this return a logical value? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly for me atm this seems like a way to turn 5 lines of readable code into 20 harder-to-understand lines. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just updated the code above to include a call to There are more functions, but they help us avoid nested if/else logic, and I think each function might be reusable for other archetypes. And |
||
if (stop_on_no_internet) | ||
stop("No internet. Cannot downoad url: ", url) | ||
else | ||
warning("No internet. Cannot download url: ", url) | ||
tar_cancel() | ||
} | ||
outpath <- curl::curl_download(url, destfile = fs::path(destdir, destfile), handle = curl::new_handle()) | ||
outpath | ||
} | ||
|
||
#' @rdname tar_download | ||
#' @export | ||
tar_url_hash <- function(url, stop_on_no_internet = FALSE, handle = NULL) { | ||
if (!curl::has_internet()) { | ||
if (stop_on_no_internet) | ||
stop("No internet. Cannot check url: ", url) | ||
else | ||
warning("No internet. Cannot check url: ", url) | ||
tar_cancel() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we namespace calls to |
||
} | ||
handle <- curl::handle_setopt(handle, nobody = TRUE) | ||
req <- curl::curl_fetch_memory(url, handle = handle) | ||
stopifnot(length(req$content) < 1L) | ||
headers <- curl::parse_headers_list(req$headers) | ||
assert_status_code(req, url) | ||
etag <- paste(headers[["etag"]], collapse = "") | ||
mtime <- paste(headers[["last-modified"]], collapse = "") | ||
return(paste(etag, mtime)) | ||
} | ||
|
||
assert_status_code <- function(req, url) { | ||
if (req$status_code != 200L) { | ||
stop("could not access url: ", url) | ||
} | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add a minimum version for
curl
?