Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial try on tar_download #9

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ Suggests:
digest (>= 0.6.25),
knitr (>= 1.28),
rmarkdown (>= 2.1),
testthat (>= 2.3.2)
testthat (>= 2.3.2),
curl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add a minimum version for curl?

Remotes:
wlandau/targets
Encoding: UTF-8
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Generated by roxygen2: do not edit by hand

export(tar_change)
export(tar_download)
export(tar_download_file)
export(tar_files)
export(tar_force)
export(tar_force_change)
Expand All @@ -9,6 +11,7 @@ export(tar_knit_run)
export(tar_render)
export(tar_render_run)
export(tar_skip)
export(tar_url_hash)
importFrom(fs,path_rel)
importFrom(rlang,sym)
importFrom(rlang,syms)
Expand Down
104 changes: 104 additions & 0 deletions R/tar_download.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#' @title Download a file from a remote source, checking for changes
#' first
#' @description Create a target that downnloads a file if it has changed since
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flagging this for proofreading when it comes time to polish things up.

#' the last attempt, based responds to a change
#' in an arbitrary value. If the value changes, the target reruns.
#' @details `tar_download()` creates a pair of targets, one upstream
#' and one downstream, usingn `tar_change()`. always runs and returns
#' header information (eTag and modified time, if available), to check
#' if the remote value has changed. This header gets referenced in the
#' downstream target, which causes the downstream target to download the
#' if the header information changes. The downstream target has "file" format
#' and stores the location of te downloaded file The behavior is cancelled if
#' `cue` is `tar_cue(depend = FALSE)` or `tar_cue(mode = "never")`.
#' @export
#' @return A list of two targets, one upstream and one downstream.
#' The upstream one checks url headers, and the downstream one
#' downloads the file to it. See the examples for details.
#' @param url The URL to fetch the file from
#' @param destfile Whether to invoke tidy evaluation
#' @param destdir the directory to download the file to
#' @param stop_on_no_internet Whether to fail in the absence of an internet connection.
#' Default is `FALSE` and only produces a warning
#' @param handle a handle created by `curl::new_handle()` to use when fetching the file
#' or checking headers. Useful for authentication.
#' @param ... other parameters passed to `tar_change()`
#' @examples
#' \dontrun{
#'
# targets::tar_dir({
# targets::tar_script({
# library(tarchetypes)
# tar_pipeline(
# tar_download(gitfile, url = "https://github.com/wlandau/tarchetypes/archive/master.zip")
# )
# })
# targets::tar_make(callr_function = NULL)
# })
#' }
tar_download <- function(
name,
url,
destfile = basename(url),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both destdir and destfile arguments? What about a single path argument that defaults to basename(url)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I like this split, is that, for me, it's a common pattern to download a lot of files from different sources to a data/ or downloads/ (or data/downloads), so this makes an easy pattern to iterate over and conceptually I separate where I put something from it's name. I don't feel strongly about it but a lot of functions split these concepts (e.g., rmarkdown::render).

destdir = ".",
handle = NULL,
stop_on_no_internet = FALSE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assert_internet instead of stop_on_no_internet?

...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider making these arguments formal. For other archetypes, I found this useful so I could change the defaults in a way that users notice.

) {
if(!requireNamespace("curl", quietly = TRUE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you use tarchetypes::assert_package("curl", "tar_download() requires the the package 'curl' to be installed")?

stop("tar_download() requires the the package 'curl' to be installed")
}
handle <- handle %||% curl::new_handle()
tar_change(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, we should use tar_change_pair().

name,
command = substitute({
tar_download_file(url, destfile, destdir, stop_on_no_internet, handle)
}, env = list(url = url, destfile = destfile, destdir = destdir, stop_on_no_internet = stop_on_no_internet, handle = handle)),
change = substitute({
tar_url_hash(url, stop_on_no_internet, handle)
}, env = list(url = url, stop_on_no_internet = stop_on_no_internet, handle = handle)),
format = "file"
...
)
}

#' @rdname tar_download
#' @export
tar_download_file <- function(url, destfile, destdir, stop_on_no_internet = FALSE, handle = curl::new_handle())
{
if (!curl::has_internet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other archetypes might require an internet connection, so it may be nice to put this in a new assert_internet() in R/utils_assert.R.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your assertion patterns don't currently allow for "warning only" options, which I want here. (The use-case being that in the absence of a connection the workflow should be able to continue with current versions of files.) Do you want to incorporate that into your assertion setup or should I just do something conditional here?

Copy link
Member

@wlandau wlandau Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. How about a validate_internet() utility?

validate_internet <- function(assert_internet = FALSE) {
  trn(assert_internet, try_cancel_internet(), assert_internet())
}

try_cancel_internet <- function(msg = NULL) {
  if (!curl::has_internet()) {
    warn_validate("no internet")
    tar_cancel()
  }
}

warn_validate <- function(...) {
  warning(warning_validate(...))
}

warning_validate <- function(...) {
  structure(
    list(message = paste0(..., collapse = ""), call = NULL),
    class = c(
      "condition_validate",
      "condition_tarchetypes",
      "warning",
      "condition"
    )
  )
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to test for internet to determine whether to cancel the target. Should this return a logical value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly for me atm this seems like a way to turn 5 lines of readable code into 20 harder-to-understand lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated the code above to include a call to tar_cancel().

There are more functions, but they help us avoid nested if/else logic, and I think each function might be reusable for other archetypes. And warn_validate() and warning_validate() support custom conditions for warnings, which make exception handling and testing nicer.

if (stop_on_no_internet)
stop("No internet. Cannot downoad url: ", url)
else
warning("No internet. Cannot download url: ", url)
tar_cancel()
}
outpath <- curl::curl_download(url, destfile = fs::path(destdir, destfile), handle = curl::new_handle())
outpath
}

#' @rdname tar_download
#' @export
tar_url_hash <- function(url, stop_on_no_internet = FALSE, handle = NULL) {
if (!curl::has_internet()) {
if (stop_on_no_internet)
stop("No internet. Cannot check url: ", url)
else
warning("No internet. Cannot check url: ", url)
tar_cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we namespace calls to targets functions?

}
handle <- curl::handle_setopt(handle, nobody = TRUE)
req <- curl::curl_fetch_memory(url, handle = handle)
stopifnot(length(req$content) < 1L)
headers <- curl::parse_headers_list(req$headers)
assert_status_code(req, url)
etag <- paste(headers[["etag"]], collapse = "")
mtime <- paste(headers[["last-modified"]], collapse = "")
return(paste(etag, mtime))
}

assert_status_code <- function(req, url) {
if (req$status_code != 200L) {
stop("could not access url: ", url)
}
}
79 changes: 79 additions & 0 deletions man/tar_download.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.