Handling large data sets in a pipeline #157

mattwarkentin · 2020-09-16T13:23:17Z

Prework

I understand and agree to the code of conduct.
I understand and agree to the contributing guidelines.
Be considerate of the maintainer's time and make it as easy as possible to troubleshoot any problems you identify. Read here and here to learn about minimal reproducible examples. Format your code according to the tidyverse style guide to make it easier for others to read.

Description

I just had a few questions about dealing with large data and how best to handle them. I have a dataset that is about 500K observations by 2000 variables, and it has seemed unexpectedly slow to build this target in my pipeline.

The pipeline has been stuck on this target for about 12 hours or so (when it only takes minutes to read the data into R and save to disk as a serialized object). On disk, the file is about a 10GB tsv file. The file already seems to be stored in _targets/objects/ as a fst_tbl (~6GB) but the pipeline is still building that target. My guess is that the since the object is already saved to disk, the target is spending this huge amount of time hashing the file, is that likely the issue?

Any suggestions for how I should be handling these large data sets? It seems like hashing is the expensive step in building a target of this size.

The text was updated successfully, but these errors were encountered:

mattwarkentin · 2020-09-16T13:28:31Z

While I'm thinking out loud...does it possibly make sense to avoid hashing a file when tar_cue(mode = "never")? The first time through the pipeline would build the target, and then it would never run again so a hash isn't strictly necessary.

wlandau · 2020-09-16T19:09:11Z

Would you share a simplified version of the pipeline code and a smaller dataset that reproduces the slowness? We need an empirical approach to identify bottlenecks like these. A profiler would be ideal, e.g. proffer::pprof(tar_make(callr_function = NULL)). You can check out proffer at https://r-prof.github.io/proffer/.

wlandau · 2020-09-16T19:45:31Z

It could also be a silent crash from running out of memory on your machine.

mattwarkentin · 2020-09-16T19:56:11Z

Would a silent crash not error out the pipeline?

mattwarkentin · 2020-09-16T20:23:03Z

When you have a target with format = "file", {targets} hashes that file on disk, right? Or does it just use file meta-data like modified date to check for changes?

wlandau · 2020-09-16T20:40:39Z

At least on clusters, memory issues can cause workers to just hang indefinitely. Not sure about exclusively local pipelines.

Unless you do something fancy with cues, targets hashes the output data file regardless of the format.

To avoid hashing that large tsv file, you could pursue a manual workaround with tarchetypes::tar_change().

tar_change(
  previously_slow_target,
  command_that_uses_big_file("big_file_path.tsv"),
  change = file.mtime("big_file_path.tsv"), # Rerun when the modification time changes.
  format = "rds" # anything except "file", doesn't even need to be user-supplied
)

wlandau · 2020-09-16T20:43:10Z

Just realized the command for that tar_change() call should be a command that actually uses the file, not the file path itself. Please see the edit above.

mattwarkentin · 2020-09-16T20:59:12Z

I think I may have found the root cause of the original issue, and it wasn't {targets} related. So I will close this while I continue to work through some things. Thanks for your help!

mattwarkentin · 2020-09-18T20:16:11Z

Hi @wlandau,

I keep getting this gigantic error. I tried to reproduce it with a simpler example, but it doesn't reproduce, so I am kinda at a loss. This error always occurs at the same spot in the pipeline. For what it's worth, when I load the targets needed to build this specific target, it works fine interactively. So I'm not sure why it fails when the pipeline is built. I can't make any sense of the traceback, hopefully you can...

 *** caught segfault ***
address 0x0, cause 'memory not mapped'
-
Traceback:
 1: c_qsave(x, file, preset, algorithm, compress_level, shuffle_control,     check_hash, nthreads)
 2: qs::qsave(x = object, file = path)
 3: store_write_path.tar_qs(store, store_coerce_object(store, object),     tmp)
 4: store_write_path(store, store_coerce_object(store, object), tmp)
 5: store_write_object.default(target$store, target$value$object)
 6: store_write_object(target$store, target$value$object)
 7: builder_update_object(target)
 8: builder_ensure_object(target, "local")
 9: builder_conclude(target, pipeline, scheduler, meta)
10: target_conclude.tar_builder(target, self$pipeline, self$scheduler,     self$meta)
11: target_conclude(target, self$pipeline, self$scheduler, self$meta)
12: self$run_target(name)
13: trn(target_should_run(target, self$meta), self$run_target(name),     target_skip(target, self$pipeline, self$scheduler, self$meta))
14: self$process_target(self$scheduler$queue$dequeue())
15: self$process_next()
16: local_init(pipeline = pipeline, names = names, queue = "sequential",     reporter = reporter, garbage_collection = garbage_collection)$run()
17: (function (pipeline, names_quosure, reporter, garbage_collection) {    pipeline_validate_lite(pipeline)    names <- tar_tidyselect(names_quosure, pipeline_get_names(pipeline))    local_init(pipeline = pipeline, names = names, queue = "sequential",         reporter = reporter, garbage_collection = garbage_collection)$run()    invisible()})(names_quosure = ~c("aucs", "predictions"), reporter = "verbose",     garbage_collection = FALSE, pipeline = <environment>)
18: do.call(targets_function, targets_arguments)
19: (function (targets_script, targets_function, targets_arguments) {    targets_arguments$pipeline <- source(targets_script)$value    do.call(targets_function, targets_arguments)})(targets_script = "_targets.R", targets_function = function (pipeline,     names_quosure, reporter, garbage_collection) {    pipeline_validate_lite(pipeline)    names <- tar_tidyseline_get_names(pipeline))    local_init(pipeline = pipeline, names = names, queue = "sequential",         reporter = reporter, garbage_collection = garbage_collection)$run()    invisible()}, targets_arguments = list(names_quosure = ~c("aucs", "predictions"),     reporter = "verbose", garbage_collection = FALSE))
20: (function (what, args, quote = FALSE, envir = parent.frame()) {    if (!is.list(args))         stop("second argument must be a list")    if (quote)         args <- lapply(args, enquote)    .Internal(do.call(what, args, envir))})(function (targets_script, targets_function, targets_arguments) {    targets_arguments$pipeline <- source(targets_script)$value    do.call(targets_function, targets_arguments)}, list(targets_script = "_targets.R", targets_function = function (pipeline,     names_quosure, reporter, garbage_collection) {    pipeline_validate_lite(pipeline)    names <- tar_tidyselect(names_quosure, pipeline_get_names(pipeline))    local_init(pipeline = pipeline, names = names, queue = "sequential",         reporter = reporter, garbage_collection = garbage_collection)$run()    invisible()}, targets_arguments = list(names_quosure = ~c("aucs", "predictions"),     reporter = "verbose", garbage_collection = FALSE)), envir = <environment>)
21: do.call(do.call, c(readRDS("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-fun-15d235e1dde41"),     list(envir = .GlobalEnv)), envir = .GlobalEnv)
22: saveRDS(do.call(do.call, c(readRDS("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-fun-15d235e1dde41"),     list(envir = .GlobalEnv)), envir = .GlobalEnv), file = "/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac")
23: withCallingHandlers({    NULL    saveRDS(do.call(do.call, c(readRDS("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-fun-15d235e1dde41"),         list(envir = .GlobalEnv)), envir = .GlobalEnv), file = "/var/folders/1q    invisible()}, error = function(e) {    {        callr_data <- as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        capture.output(assign(".Traceback", traceback(9), envir = baseenv()))        dump.frames("__callr_dump__")        assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data)        rm("__callr_dump__", envir = .GlobalEnv)        e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))        class(e2) <- c("callr_remote_error", class(e2))        e2$error <- e        calls <- sys.calls()        dcframe <- which(vapply(calls, function(x) length(x) >=             1 && identical(x[[1]], quote(do.call)), logical(1)))[1]        if (!is.na(dcframe))             e2$`_ignore` <- list(c(1, dcframe + 1L))        e2$`_pid` <- Sys.getpid()        e2$`_timestamp` <- Sys.time()        if (inherits(e, "rlib_error"))             e2$parent <- e$parent        e2 <- err$add_trace_back(e2)        saveRDS(list("error", e2), file = paste0("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac",             ".error"))    }}, interrupt = function(e) {    {        callr_data <- as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        capture.output(assign(".Traceback", traceback(9), envir = baseenv()))        dump.frames("__callr_dump__")        assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data)        rm("__callr_dump__", envir = .GlobalEnv)        e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))        class(e2) <- c("callr_remote_error", class(e2))        e2$error <- e        calls <- sys.calls()        dcframe <- which(vapply(calls, function(x) length(x) >=             1 && identical(x[[1]], quote(do.call)), logical(1)))[1]        if (!is.na(dcframe))             e2$`_ignore` <- list(c(1, dcframe + 1L))        e2$`_pid` <- Sys.getpid()        e2$`_timt <- e$parent        e2 <- err$add_trace_back(e2)        saveRDS(list("error", e2), file = paste0("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac",             ".error"))    }}, callr_message = function(e) {    try(signalCondition(e))})
24: doTryCatch(return(expr), name, parentenv, handler)
25: tryCatchOne(expr, names, parentenv, handlers[[1L]])
26: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
27: doTryCatch(return(expr), name, parentenv, handler)
28: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),     names[nh], parentenv, handlers[[nh]])
29: tryCatchList(expr, classes, parentenv, handlers)
30: tryCatch(withCallingHandlers({    NULL    saveRDS(do.call(do.call, c(readRDS("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-fun-15d235e1dde41"),         list(envir = .GlobalEnv)), envir = .GlobalEnv), file = "/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac")    flush(stdout())    flush(stderr())    NULL    invisible()}, error = function(e) {    {        callr_data <- as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        capture.output(assign(".Traceback", traceback(9), envir = baseenv()))        dump.frames("__callr_dump__")        assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data)        rm("__callr_dump__", envir = .GlobalEnv)        e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))        class(e2) <- c("callr_remote_error", class(e2))        e2$error <- e        calls <- sys.calls()        dcframe <- which(vapply(calls, function(x) length(x) >=             1 && identical(x[[1]], quote(do.call)), logical(1)))[1]        if (!is.na(dcframe))             e2$`_ignore` <- list(c(1, dcframe + 1L))        e2$`_pid` <- Sys.getpid()        e2$`_timestamp` <- Sys.time()        if (inherits(e, "rlib_error"))             e2$parent <- e$parent        e2 <- err$add_trace_back(e2)      (list("error", e2), file = paste0("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac",             ".error"))    }}, interrupt = function(e) {    {        callr_data <- as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        capture.output(assign(".Traceback", traceback(9), envir = baseenv()))        dump.frames("__callr_dump__")        assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data)        rm("__callr_dump__", envir = .GlobalEnv)        e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))        class(e2) <- c("callr_remote_error", class(e2))        e2$error <- e        calls <- sys.calls()        dcframe <- which(vapply(calls, function(x) length(x) >=             1 && identical(x[[1]], quote(do.call)), logical(1)))[1]        if (!is.na(dcframe))             e2$`_ignore` <- list(c(1, dcframe + 1L))        e2$`_pid` <- Sys.getpid()        e2$`_timestamp` <- Sys.time()        if (inherits(e, "rlib_error"))             e2$parent <- e$parent        e2 <- err$add_trace_back(e2)        saveRDS(list("error", e2), file = paste0("/var/folders/1q/l_382_ss2973kpyqn0sk_0lw0000gn/T//RtmprJ3rX0/callr-res-15d237aeaadac",             ".error"))    }}, callr_message = function(e) {    try(signalCondition(e))}), error = function(e) {    NULL    try(stop(e))}, interrupt = function(e) {    NULL    e})
An irrecoverable exception occurred. R is aborting now ...
Error in get_result(output = out, options) : 
  callr subprocess failed: could not start R, exited with non-zero status, has crashed or was killed
Type .Last.error.trace to see where the error occured

 Stack trace:

 1. targets:::tar_make(c("aucs", "predictions"))
 2. targets:::callr_outer(targets_function = tar_make_inner, targets_arguments  ...
 3. targets:::trn(is.null(callr_function), callr_inner(target_script_path(),  ...
 4. base:::do.call(callr_function, prepare_callr_arguments(callr_function,  ...
 5. (function (func, args = list(), libpath = .libPaths(), repos = default_repo ...
 6. callr:::get_result(output = out, options)
 7. throw(new_callr_error(output, killmsg))

 x callr subprocess failed: could not start R, exited with non-zero status, has crashed or was killed

wlandau · 2020-09-18T20:26:26Z

The traceback helps. Could be an instance of #147, which was due to qsbase/qs#42 and fixed in traversc/stringfish@59eff54. The latest commit of targets should avoid this problem by disabling ALTREP in qsave(), but it might also be good to update to dev qs and stringfish too. Let me know how it goes.

wlandau · 2020-09-18T20:29:30Z

If it still happens and remains elusive, let's keep an eye on it and try to reproduce it with just qs and without targets (and peel back other layers as needed).

mattwarkentin · 2020-09-18T20:43:07Z

Okay, so...the issue has been fixed. My series of updates:

remotes::install_github("traversc/stringfish")
remotes::install_github("traversc/qs")
remotes::install_github("wlandau/tarchetypes")
remotes::install_github("wlandau/targets")

Then I rebuilt the pipeline and everything worked!

mattwarkentin · 2020-09-18T20:44:29Z

As always, thanks again for saving me from myself.

mattwarkentin · 2020-09-18T21:02:04Z

Possibly naive question: So now that I have everything working, and it's very important that this keeps working while I finish up this project, I want to use {renv}. As far as I understand, renv detects dependencies by searching files for :: namespacing, library() calls, and import() calls.

Since I define the packages used for my targets pipeline globally by using tar_option_set(packages = pkgs), where pkgs is a character vector of packages, I don't have any library calls (except for targets and tarchetypes in my _targets.R file). So when I run renv::init(), only the packages related to targets and tarchetypes are detected and added to the project-specific library.

I thought if I attached all these packages before initializing {renv} it would work: purrr::walk(pkgs, require, character.only=TRUE), but it didn't. Any suggestions for how to follow the {targets} API for defining packages, while also being able to use {renv} to create a fully self-contained project?

mattwarkentin · 2020-09-18T21:13:39Z

For context:

?renv::dependencies

dependencies() will crawl files within your project, looking for R files and the packages used within those R files. This is done primarily by parsing the code and looking for calls of the form:

library(package)

require(package)

requireNamespace("package")

package::method()

For R package projects, dependencies expressed in the DESCRIPTION file will also be discovered. Note that the rmarkdown package is required in order to crawl dependencies in R Markdown files.

wlandau · 2020-09-19T00:01:44Z

It's a good question, I'm running into that problem too. Not sure if I'll build something directly into targets, but the alternative workarounds are straightforward enough:

Manual script that doesn't run

Keep using tar_option_set(packages = ...) but define a separate packages.R file (or any R script renv will find) somewhere that calls library() on these same packages. packages.R need not run, but renv will crawl through it anyway. Yes, this involves maintaining the same package list in two different places, but you usually don't spend a whole lot of time adding new packages to an already established project, so it doesn't seem too burdensome.

Automatically written script that doesn't run

This one is like the previous workaround, except you automatically generate packages.R using tar_option_get("packages"):

# _targets.R
library(targets)
tar_option_set(packages = c("tidyverse", "qs"))
lines <- paste0("library(", tar_option_get("packages"), ")")
writeLines(lines, "packages.R")
tar_pipeline(...)

Just call library(pkg) instead of tar_option_set() for packages

If you don't call tar_option_set(packages = ...), targets just keeps track of the packages currently loaded. So following will also work, even for distributed computing workflows on large clusters (e.g. tar_make_clustermq()).

# _targets.R
library(targets)
library(tidyverse)
library(qs)
tar_pipeline(...)

The advantage here is renv will detect all your packages, and you only have to write these packages once. The disadvantage is that all your packages will reload all over again when you call tar_visnetwork(), tar_outdated(), and even tar_make() even if all your targets are already up to date.

I think all of these are doable and there's no right answer. Just depends on your preference.

mattwarkentin · 2020-09-20T16:00:01Z

Thanks for the response. Hmm, those are all reasonable solutions. I think the second one is the most automated and avoids loading the packages too often.

Would it be at all reasonable to perhaps write a function like tar_env() which takes the character vectors of packages as the only argument, and writes a file _library.R to _targets/meta and returns the vector of packages to tar_option_set(). The API could look something like:

tar_env <- function(pkgs) {
  stopifnot(all(is.character(pkgs)))
  con <- "_targets/meta/_library.R"
  txt <- c("# Generated by targets::tar_env: do not edit by hand", 
           paste0("library(", pkgs, ")"))
  writeLines(txt, con)
  return(pkgs)
}

pkgs <- c("tidyverse", "qs")

tar_option_set(
  packages = tar_env(pkgs)
)

Which would create the following file...

# Generated by targets::tar_env: do not edit by hand
library(tidyverse)
library(qs)

I tested this out and it works as expected, {renv} detects these dependencies and caches them. The _library.R file doesn't affect the pipeline in any other way, but it is a valid R file so it has the nice side-effect that the user can source the file to quickly load all of the packages used in their project.

I kinda like this solution because it's a very lightweight addition but makes the automation both simple and optional. Users who don't use renv won't notice the change, and it makes life easier for those of us who do. It also means {targets} would offer native support for {renv}. which I think is in-scope given how {targets} and {renv} work together to produce self-contained and reproducible data science pipelines.

What are your thoughts? I would be more than happy to spin this up into a PR. I am keen to contribute something more to this package than just nagging questions.

wlandau · 2020-09-20T21:22:08Z

I think that's an excellent idea. Sounds like you are eager to submit a PR, so I will plan review and approve it rather than implementing this feature myself.

A few preferences:

Name the function tar_renv() to make the intent clearer.
Name the first argument packages so the arg names agree with tar_option_set() and tar_target().
Set the default value of packages to targets::tar_option_get("packages").
Support an ask argument and follow the same file overwriting procedure as tar_script():

https://github.com/wlandau/targets/blob/2c53ff46060e3389f4cd8fdf192c3e3124a06580/R/tar_script.R#L58-L61

Name the file _packages.R and write it to the project root. I would prefer to leave the _targets/ directory alone for this.
Invisibly return NULL. A good function should either have side effects or return a value, but not both. So usage would look more like this:

# _targets.R
tar_option_set(packages = c("qs", "tidyverse"))
tar_renv()

The package's internal code tries to use complete words for variable names, even ones the user never sees. To stay consistent, let's change txt to lines.
In keeping with the tidyverse style guide and the code style of the package, for long function calls, let's give each argument its own line and indent 2 spaces. So instead of

lines <- c("# Generated by targets::tar_renv(). Do not edit by hand.", 
           paste0("library(", packages, ")"))

let's write

lines <- c(
  "# Generated by targets::tar_renv(). Do not edit by hand.", 
  paste0("library(", packages, ")")
)

wlandau · 2020-09-20T21:40:13Z

Also, for the pkgdown site, let's list tar_renv() in the Setup section between tar_edit() and tar_script():

https://github.com/wlandau/targets/blob/04aab65cf44e2fde9f59d8478ff0270b47c28e6c/_pkgdown.yml#L32-L35

mattwarkentin · 2020-09-21T01:28:17Z

Everything you said sounds good - thanks for the advice. Also, thank you for letting me take a shot at this, I know you could implement yourself much faster, but I appreciate the chance to get some more experience contributing to open-source projects.

mattwarkentin · 2020-09-22T17:16:32Z

Hi @wlandau,

I was just about to submit the PR but it occurs to me that if the expected API is as follows (i.e. call tar_renv() after tar_option_set()):

library(targets)

tar_script({
  tar_option_set(packages = "glue")
  tar_renv()
  tar_pipeline(
    tar_target(foo, head(mtcars), packages = "fs")
  )
})
tar_make()
#> ● run target foo
readLines("_packages.R")
#> [1] "# Generated by targets::tar_env: do not edit by hand"
#> [2] "library(glue)"

That we actually miss the target-specific package deps. Thoughts? tar_renv() can't go after tar_pipeline() so I'm not sure the best way to solve this just yet.

mattwarkentin · 2020-09-22T17:21:20Z

Since {rstudioapi} is already a {targets} dependency, I could make tar_renv() more sentient and use a regex to detect the packages defined in the script in which tar_renv() is called (such as with rstudioapi::getSourceEditorContext()). Thoughts?

Actually maybe that won't work because rstudioapi doesn't work non-interactively...

mattwarkentin · 2020-09-22T17:34:19Z

This gets us the calling file contents...

library(targets)

tar_script({
  tar_option_set(packages = "glue")
  x <- readLines(parent.frame(2)$ofile)
  tar_pipeline(
    tar_target(foo, x, packages = "fs")
  )
})
tar_make()
#> �[34m●�[39m run target foo
tar_read(foo)
#> [1] "library(targets)"                                   
#> [2] "tar_option_set(packages = \"glue\")"                
#> [3] "x <- readLines(parent.frame(2)$ofile)"              
#> [4] "tar_pipeline(tar_target(foo, x, packages = \"fs\"))"

^{Created on 2020-09-22 by the reprex package (v0.3.0)}

wlandau · 2020-09-22T17:40:34Z

That we actually miss the target-specific package deps.

Seems like manual control over the packages argument could compensate for this.

tar_renv(packages = c(tar_option_set("packages"), "otherPackage"))

I think this is reasonable because it supports the majority of use cases. The other option I see is to crawl through the pipeline object.

Since {rstudioapi} is already a {targets} dependency, I could make tar_renv() more sentient and use a regex to detect the packages defined in the script in which tar_renv() is called (such as with rstudioapi::getSourceEditorContext()). Thoughts?

I would prefer not to use rstudioapi for this. rstudioapi is a Suggests: for addins only. Plus, not everyone uses the RStudio IDE.

I could make tar_renv() more sentient and use a regex to detect the packages defined in the script in which tar_renv() is called (such as with rstudioapi::getSourceEditorContext()). Thoughts?

I would prefer to avoid over-engineering a feature like this. I think it should be simple and cover just common use cases. Special circumstances would require special workarounds, and that's fine. Features are well calibrated when simple stuff is still simple and hard stuff is not necessarily fully automated.

mattwarkentin · 2020-09-22T17:46:08Z

I would prefer to avoid over-engineering a feature like this

I strongly agree. So I think the function can be left as-is, and users can manually manage the packages argument for more complex use cases.

I can add a line to the documentation about tar_renv()s inability to capture any dependencies that are defined after tar_renv() is called in the script, and that this falls on the responsibility of the user. Agreed?

wlandau · 2020-09-22T17:54:38Z

Yes, sounds great.

mattwarkentin added the topic: performance label Sep 16, 2020

mattwarkentin assigned wlandau Sep 16, 2020

wlandau added the depends: reprex label Sep 16, 2020

mattwarkentin closed this as completed Sep 16, 2020

mattwarkentin mentioned this issue Sep 22, 2020

Added tar_renv #170

Merged

3 tasks

mattwarkentin mentioned this issue Oct 5, 2020

tar_make_clustermq() hangs and tar_make() works when building large targets #182

Closed

6 tasks

MarekGierlinski mentioned this issue May 16, 2024

qsave() bad binding access qsbase/qs#94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling large data sets in a pipeline #157

Handling large data sets in a pipeline #157

mattwarkentin commented Sep 16, 2020 •

edited

Loading

mattwarkentin commented Sep 16, 2020

wlandau commented Sep 16, 2020

wlandau commented Sep 16, 2020

mattwarkentin commented Sep 16, 2020

mattwarkentin commented Sep 16, 2020 •

edited

Loading

wlandau commented Sep 16, 2020 •

edited

Loading

wlandau commented Sep 16, 2020

mattwarkentin commented Sep 16, 2020

mattwarkentin commented Sep 18, 2020

wlandau commented Sep 18, 2020

wlandau commented Sep 18, 2020 •

edited

Loading

mattwarkentin commented Sep 18, 2020

mattwarkentin commented Sep 18, 2020

mattwarkentin commented Sep 18, 2020 •

edited

Loading

mattwarkentin commented Sep 18, 2020 •

edited

Loading

wlandau commented Sep 19, 2020 •

edited

Loading

mattwarkentin commented Sep 20, 2020

wlandau commented Sep 20, 2020 •

edited

Loading

wlandau commented Sep 20, 2020

mattwarkentin commented Sep 21, 2020

mattwarkentin commented Sep 22, 2020 •

edited

Loading

mattwarkentin commented Sep 22, 2020 •

edited

Loading

mattwarkentin commented Sep 22, 2020 •

edited

Loading

wlandau commented Sep 22, 2020

mattwarkentin commented Sep 22, 2020

wlandau commented Sep 22, 2020

Handling large data sets in a pipeline #157

Handling large data sets in a pipeline #157

Comments

mattwarkentin commented Sep 16, 2020 • edited Loading

Prework

Description

mattwarkentin commented Sep 16, 2020

wlandau commented Sep 16, 2020

wlandau commented Sep 16, 2020

mattwarkentin commented Sep 16, 2020

mattwarkentin commented Sep 16, 2020 • edited Loading

wlandau commented Sep 16, 2020 • edited Loading

wlandau commented Sep 16, 2020

mattwarkentin commented Sep 16, 2020

mattwarkentin commented Sep 18, 2020

wlandau commented Sep 18, 2020

wlandau commented Sep 18, 2020 • edited Loading

mattwarkentin commented Sep 18, 2020

mattwarkentin commented Sep 18, 2020

mattwarkentin commented Sep 18, 2020 • edited Loading

mattwarkentin commented Sep 18, 2020 • edited Loading

wlandau commented Sep 19, 2020 • edited Loading

Manual script that doesn't run

Automatically written script that doesn't run

Just call library(pkg) instead of tar_option_set() for packages

mattwarkentin commented Sep 20, 2020

wlandau commented Sep 20, 2020 • edited Loading

wlandau commented Sep 20, 2020

mattwarkentin commented Sep 21, 2020

mattwarkentin commented Sep 22, 2020 • edited Loading

mattwarkentin commented Sep 22, 2020 • edited Loading

mattwarkentin commented Sep 22, 2020 • edited Loading

wlandau commented Sep 22, 2020

mattwarkentin commented Sep 22, 2020

wlandau commented Sep 22, 2020

mattwarkentin commented Sep 16, 2020 •

edited

Loading

mattwarkentin commented Sep 16, 2020 •

edited

Loading

wlandau commented Sep 16, 2020 •

edited

Loading

wlandau commented Sep 18, 2020 •

edited

Loading

mattwarkentin commented Sep 18, 2020 •

edited

Loading

mattwarkentin commented Sep 18, 2020 •

edited

Loading

wlandau commented Sep 19, 2020 •

edited

Loading

wlandau commented Sep 20, 2020 •

edited

Loading

mattwarkentin commented Sep 22, 2020 •

edited

Loading

mattwarkentin commented Sep 22, 2020 •

edited

Loading

mattwarkentin commented Sep 22, 2020 •

edited

Loading