Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling large data sets in a pipeline #157

Closed
3 tasks done
mattwarkentin opened this issue Sep 16, 2020 · 26 comments
Closed
3 tasks done

Handling large data sets in a pipeline #157

mattwarkentin opened this issue Sep 16, 2020 · 26 comments

Comments

@mattwarkentin
Copy link
Contributor

mattwarkentin commented Sep 16, 2020

Prework

  • I understand and agree to the code of conduct.
  • I understand and agree to the contributing guidelines.
  • Be considerate of the maintainer's time and make it as easy as possible to troubleshoot any problems you identify. Read here and here to learn about minimal reproducible examples. Format your code according to the tidyverse style guide to make it easier for others to read.

Description

Hi @wlandau,

I just had a few questions about dealing with large data and how best to handle them. I have a dataset that is about 500K observations by 2000 variables, and it has seemed unexpectedly slow to build this target in my pipeline.

The pipeline has been stuck on this target for about 12 hours or so (when it only takes minutes to read the data into R and save to disk as a serialized object). On disk, the file is about a 10GB tsv file. The file already seems to be stored in _targets/objects/ as a fst_tbl (~6GB) but the pipeline is still building that target. My guess is that the since the object is already saved to disk, the target is spending this huge amount of time hashing the file, is that likely the issue?

Any suggestions for how I should be handling these large data sets? It seems like hashing is the expensive step in building a target of this size.

@mattwarkentin
Copy link
Contributor Author

While I'm thinking out loud...does it possibly make sense to avoid hashing a file when tar_cue(mode = "never")? The first time through the pipeline would build the target, and then it would never run again so a hash isn't strictly necessary.

@wlandau
Copy link
Member

wlandau commented Sep 16, 2020

Would you share a simplified version of the pipeline code and a smaller dataset that reproduces the slowness? We need an empirical approach to identify bottlenecks like these. A profiler would be ideal, e.g. proffer::pprof(tar_make(callr_function = NULL)). You can check out proffer at https://r-prof.github.io/proffer/.

@wlandau
Copy link
Member

wlandau commented Sep 16, 2020

It could also be a silent crash from running out of memory on your machine.

@mattwarkentin
Copy link
Contributor Author

Would a silent crash not error out the pipeline?

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Sep 16, 2020

When you have a target with format = "file", {targets} hashes that file on disk, right? Or does it just use file meta-data like modified date to check for changes?

@wlandau
Copy link
Member

wlandau commented Sep 16, 2020

At least on clusters, memory issues can cause workers to just hang indefinitely. Not sure about exclusively local pipelines.

Unless you do something fancy with cues, targets hashes the output data file regardless of the format.

To avoid hashing that large tsv file, you could pursue a manual workaround with tarchetypes::tar_change().

tar_change(
  previously_slow_target,
  command_that_uses_big_file("big_file_path.tsv"),
  change = file.mtime("big_file_path.tsv"), # Rerun when the modification time changes.
  format = "rds" # anything except "file", doesn't even need to be user-supplied
)

@wlandau
Copy link
Member

wlandau commented Sep 16, 2020

Just realized the command for that tar_change() call should be a command that actually uses the file, not the file path itself. Please see the edit above.

@mattwarkentin
Copy link
Contributor Author

I think I may have found the root cause of the original issue, and it wasn't {targets} related. So I will close this while I continue to work through some things. Thanks for your help!

@mattwarkentin
Copy link
Contributor Author

Hi @wlandau,

I keep getting this gigantic error. I tried to reproduce it with a simpler example, but it doesn't reproduce, so I am kinda at a loss. This error always occurs at the same spot in the pipeline. For what it's worth, when I load the targets needed to build this specific target, it works fine interactively. So I'm not sure why it fails when the pipeline is built. I can't make any sense of the traceback, hopefully you can...

 *** caught segfault ***
address 0x0, cause 'memory not mapped'
-
Traceback:
 1: c_qsave(x, file, preset, algorithm, compress_level, shuffle_control,     check_hash, nthreads)
 2: qs::qsave(x = object, file = path)
 3: store_write_path.tar_qs(store, store_coerce_object(store, object),     tmp)
 4: store_write_path(store, store_coerce_object(store, object), tmp)
 5: store_write_object.default(target$store, target$value$object)
 6: store_write_object(target$store, target$value$object)
 7: builder_update_object(target)
 8: builder_ensure_object(target, "local")
 9: builder_conclude(target, pipeline, scheduler, meta)
10: target_conclude.tar_builder(target, self$pipeline, self$scheduler,     self$meta)
11: target_conclude(target, self$pipeline, self$scheduler, self$meta)
12: self$run_target(name)
13: trn(target_should_run(target, self$meta), self$run_target(name),     target_skip(target, self$pipeline, self$scheduler, self$meta))
14: self$process_target(self$scheduler$queue$dequeue())
15: self$process_next()
16: local_init(pipeline = pipeline, names = names, queue = "sequential",     reporter = reporter, garbage_collection = garbage_collection)$run()
17: (function (pipeline, names_quosure, reporter, garbage_collection) {    pipeline_validate_lite(pipeline)    names <- tar_tidyselect(names_quosure, pipeline_get_names(pipeline))    local_init(pipeline = pipeline, names = names, queue = "sequential",         reporter = reporter, garbage_collection = garbage_collection)$run()    invisible()})(names_quosure = ~c("aucs", "predictions"), reporter = "verbose",     garbage_collection = FALSE, pipeline = <environment>)
18: do.call(targets_function, targets_arguments)
19: (function (targets_script, targets_function, targets_arguments) {    targets_arguments$pipeline <- source(targets_script)$value    do.call(targets_function, targets_arguments)})(targets_script = "_targets.R", targets_function = function (pipeline,     names_quosure, reporter, garbage_collection) {    pipeline_validate_lite(pipeline)    names <- tar_tidyseline_get_names(pipeline))    local_init(pipeline = pipeline, names = names, queue = "sequential",         reporter = reporter, garbage_collection = garbage_collection)$run()    invisible()}, targets_arguments = list(names_quosure = ~c("aucs", "predictions"),     reporter = "verbose", garbage_collection = FALSE))
20: (function (what, args, quote = FALSE, envir = parent.frame()) {    if (!is.list(args))         stop("second argument must be a list")    if (quote)         args <- lapply(args, enquote)    .Internal(do.call(what, args, envir))})(function (targets_script, targets_function, targets_arguments) {    targets_arguments$pipeline <- source(targets_script)$value    do.call(targets_function, targets_arguments)}, list(targets_script = "_targets.R", targets_function = function (pipeline,     names_quosure, reporter, garbage_collection) {    pipeline_validate_lite(pipeline)    names <- tar_tidyselect(names_quosure, pipeline_get_names(pipeline))    local_init(pipeline = pipeline, names = names, queue = "sequential",         reporter = reporter, garbage_collection = garbage_collection)$run()    invisible()}, targets_arguments = list(names_quosure = ~c("aucs", "predictions"),     reporter = "verbose", garbage_collection = FALSE)), envir = <environment>)
21: do.call(do.call, c(readRDS("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-fun-15d235e1dde41"),     list(envir = .GlobalEnv)), envir = .GlobalEnv)
22: saveRDS(do.call(do.call, c(readRDS("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-fun-15d235e1dde41"),     list(envir = .GlobalEnv)), envir = .GlobalEnv), file = "/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac")
23: withCallingHandlers({    NULL    saveRDS(do.call(do.call, c(readRDS("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-fun-15d235e1dde41"),         list(envir = .GlobalEnv)), envir = .GlobalEnv), file = "/var/folders/1q    invisible()}, error = function(e) {    {        callr_data <- as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        capture.output(assign(".Traceback", traceback(9), envir = baseenv()))        dump.frames("__callr_dump__")        assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data)        rm("__callr_dump__", envir = .GlobalEnv)        e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))        class(e2) <- c("callr_remote_error", class(e2))        e2$error <- e        calls <- sys.calls()        dcframe <- which(vapply(calls, function(x) length(x) >=             1 && identical(x[[1]], quote(do.call)), logical(1)))[1]        if (!is.na(dcframe))             e2$`_ignore` <- list(c(1, dcframe + 1L))        e2$`_pid` <- Sys.getpid()        e2$`_timestamp` <- Sys.time()        if (inherits(e, "rlib_error"))             e2$parent <- e$parent        e2 <- err$add_trace_back(e2)        saveRDS(list("error", e2), file = paste0("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac",             ".error"))    }}, interrupt = function(e) {    {        callr_data <- as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        capture.output(assign(".Traceback", traceback(9), envir = baseenv()))        dump.frames("__callr_dump__")        assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data)        rm("__callr_dump__", envir = .GlobalEnv)        e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))        class(e2) <- c("callr_remote_error", class(e2))        e2$error <- e        calls <- sys.calls()        dcframe <- which(vapply(calls, function(x) length(x) >=             1 && identical(x[[1]], quote(do.call)), logical(1)))[1]        if (!is.na(dcframe))             e2$`_ignore` <- list(c(1, dcframe + 1L))        e2$`_pid` <- Sys.getpid()        e2$`_timt <- e$parent        e2 <- err$add_trace_back(e2)        saveRDS(list("error", e2), file = paste0("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac",             ".error"))    }}, callr_message = function(e) {    try(signalCondition(e))})
24: doTryCatch(return(expr), name, parentenv, handler)
25: tryCatchOne(expr, names, parentenv, handlers[[1L]])
26: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
27: doTryCatch(return(expr), name, parentenv, handler)
28: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),     names[nh], parentenv, handlers[[nh]])
29: tryCatchList(expr, classes, parentenv, handlers)
30: tryCatch(withCallingHandlers({    NULL    saveRDS(do.call(do.call, c(readRDS("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-fun-15d235e1dde41"),         list(envir = .GlobalEnv)), envir = .GlobalEnv), file = "/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac")    flush(stdout())    flush(stderr())    NULL    invisible()}, error = function(e) {    {        callr_data <- as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        capture.output(assign(".Traceback", traceback(9), envir = baseenv()))        dump.frames("__callr_dump__")        assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data)        rm("__callr_dump__", envir = .GlobalEnv)        e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))        class(e2) <- c("callr_remote_error", class(e2))        e2$error <- e        calls <- sys.calls()        dcframe <- which(vapply(calls, function(x) length(x) >=             1 && identical(x[[1]], quote(do.call)), logical(1)))[1]        if (!is.na(dcframe))             e2$`_ignore` <- list(c(1, dcframe + 1L))        e2$`_pid` <- Sys.getpid()        e2$`_timestamp` <- Sys.time()        if (inherits(e, "rlib_error"))             e2$parent <- e$parent        e2 <- err$add_trace_back(e2)      (list("error", e2), file = paste0("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac",             ".error"))    }}, interrupt = function(e) {    {        callr_data <- as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        capture.output(assign(".Traceback", traceback(9), envir = baseenv()))        dump.frames("__callr_dump__")        assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data)        rm("__callr_dump__", envir = .GlobalEnv)        e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))        class(e2) <- c("callr_remote_error", class(e2))        e2$error <- e        calls <- sys.calls()        dcframe <- which(vapply(calls, function(x) length(x) >=             1 && identical(x[[1]], quote(do.call)), logical(1)))[1]        if (!is.na(dcframe))             e2$`_ignore` <- list(c(1, dcframe + 1L))        e2$`_pid` <- Sys.getpid()        e2$`_timestamp` <- Sys.time()        if (inherits(e, "rlib_error"))             e2$parent <- e$parent        e2 <- err$add_trace_back(e2)        saveRDS(list("error", e2), file = paste0("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac",             ".error"))    }}, callr_message = function(e) {    try(signalCondition(e))}), error = function(e) {    NULL    try(stop(e))}, interrupt = function(e) {    NULL    e})
An irrecoverable exception occurred. R is aborting now ...
Error in get_result(output = out, options) : 
  callr subprocess failed: could not start R, exited with non-zero status, has crashed or was killed
Type .Last.error.trace to see where the error occured
 Stack trace:

 1. targets:::tar_make(c("aucs", "predictions"))
 2. targets:::callr_outer(targets_function = tar_make_inner, targets_arguments  ...
 3. targets:::trn(is.null(callr_function), callr_inner(target_script_path(),  ...
 4. base:::do.call(callr_function, prepare_callr_arguments(callr_function,  ...
 5. (function (func, args = list(), libpath = .libPaths(), repos = default_repo ...
 6. callr:::get_result(output = out, options)
 7. throw(new_callr_error(output, killmsg))

 x callr subprocess failed: could not start R, exited with non-zero status, has crashed or was killed 

@wlandau
Copy link
Member

wlandau commented Sep 18, 2020

The traceback helps. Could be an instance of #147, which was due to qsbase/qs#42 and fixed in traversc/stringfish@59eff54. The latest commit of targets should avoid this problem by disabling ALTREP in qsave(), but it might also be good to update to dev qs and stringfish too. Let me know how it goes.

@wlandau
Copy link
Member

wlandau commented Sep 18, 2020

If it still happens and remains elusive, let's keep an eye on it and try to reproduce it with just qs and without targets (and peel back other layers as needed).

@mattwarkentin
Copy link
Contributor Author

Okay, so...the issue has been fixed. My series of updates:

remotes::install_github("traversc/stringfish")
remotes::install_github("traversc/qs")
remotes::install_github("wlandau/tarchetypes")
remotes::install_github("wlandau/targets")

Then I rebuilt the pipeline and everything worked!

@mattwarkentin
Copy link
Contributor Author

As always, thanks again for saving me from myself.

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Sep 18, 2020

Possibly naive question: So now that I have everything working, and it's very important that this keeps working while I finish up this project, I want to use {renv}. As far as I understand, renv detects dependencies by searching files for :: namespacing, library() calls, and import() calls.

Since I define the packages used for my targets pipeline globally by using tar_option_set(packages = pkgs), where pkgs is a character vector of packages, I don't have any library calls (except for targets and tarchetypes in my _targets.R file). So when I run renv::init(), only the packages related to targets and tarchetypes are detected and added to the project-specific library.

I thought if I attached all these packages before initializing {renv} it would work: purrr::walk(pkgs, require, character.only=TRUE), but it didn't. Any suggestions for how to follow the {targets} API for defining packages, while also being able to use {renv} to create a fully self-contained project?

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Sep 18, 2020

For context:

?renv::dependencies
dependencies() will crawl files within your project, looking for R files and the packages used within those R files. This is done primarily by parsing the code and looking for calls of the form:

library(package)

require(package)

requireNamespace("package")

package::method()

For R package projects, dependencies expressed in the DESCRIPTION file will also be discovered. Note that the rmarkdown package is required in order to crawl dependencies in R Markdown files.

@wlandau
Copy link
Member

wlandau commented Sep 19, 2020

It's a good question, I'm running into that problem too. Not sure if I'll build something directly into targets, but the alternative workarounds are straightforward enough:

Manual script that doesn't run

Keep using tar_option_set(packages = ...) but define a separate packages.R file (or any R script renv will find) somewhere that calls library() on these same packages. packages.R need not run, but renv will crawl through it anyway. Yes, this involves maintaining the same package list in two different places, but you usually don't spend a whole lot of time adding new packages to an already established project, so it doesn't seem too burdensome.

Automatically written script that doesn't run

This one is like the previous workaround, except you automatically generate packages.R using tar_option_get("packages"):

# _targets.R
library(targets)
tar_option_set(packages = c("tidyverse", "qs"))
lines <- paste0("library(", tar_option_get("packages"), ")")
writeLines(lines, "packages.R")
tar_pipeline(...)

Just call library(pkg) instead of tar_option_set() for packages

If you don't call tar_option_set(packages = ...), targets just keeps track of the packages currently loaded. So following will also work, even for distributed computing workflows on large clusters (e.g. tar_make_clustermq()).

# _targets.R
library(targets)
library(tidyverse)
library(qs)
tar_pipeline(...)

The advantage here is renv will detect all your packages, and you only have to write these packages once. The disadvantage is that all your packages will reload all over again when you call tar_visnetwork(), tar_outdated(), and even tar_make() even if all your targets are already up to date.

I think all of these are doable and there's no right answer. Just depends on your preference.

@mattwarkentin
Copy link
Contributor Author

Thanks for the response. Hmm, those are all reasonable solutions. I think the second one is the most automated and avoids loading the packages too often.

Would it be at all reasonable to perhaps write a function like tar_env() which takes the character vectors of packages as the only argument, and writes a file _library.R to _targets/meta and returns the vector of packages to tar_option_set(). The API could look something like:

tar_env <- function(pkgs) {
  stopifnot(all(is.character(pkgs)))
  con <- "_targets/meta/_library.R"
  txt <- c("# Generated by targets::tar_env: do not edit by hand", 
           paste0("library(", pkgs, ")"))
  writeLines(txt, con)
  return(pkgs)
}

pkgs <- c("tidyverse", "qs")

tar_option_set(
  packages = tar_env(pkgs)
)

Which would create the following file...

# Generated by targets::tar_env: do not edit by hand
library(tidyverse)
library(qs)

I tested this out and it works as expected, {renv} detects these dependencies and caches them. The _library.R file doesn't affect the pipeline in any other way, but it is a valid R file so it has the nice side-effect that the user can source the file to quickly load all of the packages used in their project.

I kinda like this solution because it's a very lightweight addition but makes the automation both simple and optional. Users who don't use renv won't notice the change, and it makes life easier for those of us who do. It also means {targets} would offer native support for {renv}. which I think is in-scope given how {targets} and {renv} work together to produce self-contained and reproducible data science pipelines.

What are your thoughts? I would be more than happy to spin this up into a PR. I am keen to contribute something more to this package than just nagging questions.

@wlandau
Copy link
Member

wlandau commented Sep 20, 2020

I think that's an excellent idea. Sounds like you are eager to submit a PR, so I will plan review and approve it rather than implementing this feature myself.

A few preferences:

  • Name the function tar_renv() to make the intent clearer.
  • Name the first argument packages so the arg names agree with tar_option_set() and tar_target().
  • Set the default value of packages to targets::tar_option_get("packages").
  • Support an ask argument and follow the same file overwriting procedure as tar_script():

https://github.com/wlandau/targets/blob/2c53ff46060e3389f4cd8fdf192c3e3124a06580/R/tar_script.R#L58-L61

  • Name the file _packages.R and write it to the project root. I would prefer to leave the _targets/ directory alone for this.
  • Invisibly return NULL. A good function should either have side effects or return a value, but not both. So usage would look more like this:
# _targets.R
tar_option_set(packages = c("qs", "tidyverse"))
tar_renv()
  • The package's internal code tries to use complete words for variable names, even ones the user never sees. To stay consistent, let's change txt to lines.
  • In keeping with the tidyverse style guide and the code style of the package, for long function calls, let's give each argument its own line and indent 2 spaces. So instead of
lines <- c("# Generated by targets::tar_renv(). Do not edit by hand.", 
           paste0("library(", packages, ")"))

let's write

lines <- c(
  "# Generated by targets::tar_renv(). Do not edit by hand.", 
  paste0("library(", packages, ")")
)

@wlandau
Copy link
Member

wlandau commented Sep 20, 2020

Also, for the pkgdown site, let's list tar_renv() in the Setup section between tar_edit() and tar_script():

https://github.com/wlandau/targets/blob/04aab65cf44e2fde9f59d8478ff0270b47c28e6c/_pkgdown.yml#L32-L35

@mattwarkentin
Copy link
Contributor Author

Everything you said sounds good - thanks for the advice. Also, thank you for letting me take a shot at this, I know you could implement yourself much faster, but I appreciate the chance to get some more experience contributing to open-source projects.

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Sep 22, 2020

Hi @wlandau,

I was just about to submit the PR but it occurs to me that if the expected API is as follows (i.e. call tar_renv() after tar_option_set()):

library(targets)

tar_script({
  tar_option_set(packages = "glue")
  tar_renv()
  tar_pipeline(
    tar_target(foo, head(mtcars), packages = "fs")
  )
})
tar_make()
#> ● run target foo
readLines("_packages.R")
#> [1] "# Generated by targets::tar_env: do not edit by hand"
#> [2] "library(glue)"

That we actually miss the target-specific package deps. Thoughts? tar_renv() can't go after tar_pipeline() so I'm not sure the best way to solve this just yet.

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Sep 22, 2020

Since {rstudioapi} is already a {targets} dependency, I could make tar_renv() more sentient and use a regex to detect the packages defined in the script in which tar_renv() is called (such as with rstudioapi::getSourceEditorContext()). Thoughts?

Actually maybe that won't work because rstudioapi doesn't work non-interactively...

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Sep 22, 2020

This gets us the calling file contents...

library(targets)

tar_script({
  tar_option_set(packages = "glue")
  x <- readLines(parent.frame(2)$ofile)
  tar_pipeline(
    tar_target(foo, x, packages = "fs")
  )
})
tar_make()
#> �[34m●�[39m run target foo
tar_read(foo)
#> [1] "library(targets)"                                   
#> [2] "tar_option_set(packages = \"glue\")"                
#> [3] "x <- readLines(parent.frame(2)$ofile)"              
#> [4] "tar_pipeline(tar_target(foo, x, packages = \"fs\"))"

Created on 2020-09-22 by the reprex package (v0.3.0)

@wlandau
Copy link
Member

wlandau commented Sep 22, 2020

That we actually miss the target-specific package deps.

Seems like manual control over the packages argument could compensate for this.

tar_renv(packages = c(tar_option_set("packages"), "otherPackage"))

I think this is reasonable because it supports the majority of use cases. The other option I see is to crawl through the pipeline object.

Since {rstudioapi} is already a {targets} dependency, I could make tar_renv() more sentient and use a regex to detect the packages defined in the script in which tar_renv() is called (such as with rstudioapi::getSourceEditorContext()). Thoughts?

I would prefer not to use rstudioapi for this. rstudioapi is a Suggests: for addins only. Plus, not everyone uses the RStudio IDE.

I could make tar_renv() more sentient and use a regex to detect the packages defined in the script in which tar_renv() is called (such as with rstudioapi::getSourceEditorContext()). Thoughts?

I would prefer to avoid over-engineering a feature like this. I think it should be simple and cover just common use cases. Special circumstances would require special workarounds, and that's fine. Features are well calibrated when simple stuff is still simple and hard stuff is not necessarily fully automated.

@mattwarkentin
Copy link
Contributor Author

I would prefer to avoid over-engineering a feature like this

I strongly agree. So I think the function can be left as-is, and users can manually manage the packages argument for more complex use cases.

I can add a line to the documentation about tar_renv()s inability to capture any dependencies that are defined after tar_renv() is called in the script, and that this falls on the responsibility of the user. Agreed?

@wlandau
Copy link
Member

wlandau commented Sep 22, 2020

Yes, sounds great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants