-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tar_make_clustermq() hangs and tar_make() works when building large targets #182
Comments
Such a helpful reprex, thank you. I am almost positive this is because |
1c3890a might have fixed it. Testing now. |
Fixed. rows <- 5e5
cols <- 1e3
data <- data.frame(matrix(runif(rows * cols), nrow = rows))
vroom::vroom_write(data, "data.tsv")
library(targets)
tar_script({
options(clustermq.scheduler = "multicore", crayon.enabled = FALSE)
tar_option_set(
memory = "transient",
storage = "remote",
retrieval = "remote"
)
tar_pipeline(
tar_target(
file,
"data.tsv",
format = "file"
),
tar_target(
data,
readr::read_tsv(file, col_types = readr::cols()),
format = "fst_tbl"
)
)
})
tar_destroy()
system.time(tar_make())
#> ● run target file
#> ● run target data
#> user system elapsed
#> 152.158 41.953 245.836
tar_destroy()
system.time(tar_make_clustermq())
#> ● run target file
#> ● run target data
#> Master: [231.6s 0.0% CPU]; Worker: [avg 74.4% CPU, max 554682585.0 Mb]
#> user system elapsed
#> 150.492 23.516 233.179 Created on 2020-10-05 by the reprex package (v0.3.0) |
Amazing! I'm glad I finally decided to take the time to figure out where the roadblock was and produce a reprex. This was by far the greatest source of confusion and friction for me, so I can't wait to install the dev version and test it out. |
For posterity, I think #157 was related to this fix. |
Prework
Description
This has been a recurring issue for me. There are a few select targets that will hang indefinitely when building with
tar_make_clustermq()
, but when I usetar_make()
it works without issue.tar_make_clustermq()
will hang for tens of minutes or hours, whiletar_make()
takes a few minutes to build the target. I may have even filed an issue about this before, but I am determined to try to make a reprex this time around. Possibly related to #169.Building this particular target is simply just reading in a large tab-separated file with
vroom::vroom()
. The file is about 500,000 rows and 2,500 columns. The file size is 9.7GB. I have simulated some data to stand in its place.Some notable features that may or may not be relevant to this issue:
The data are stored on a mounted network drive, and not directly on my Mac's hard drive
Some tar options that I set...
tar_make_clustermq()
arguments:Reproducible example
Brief Summary: Scenario 1 and 2 run fine with
tar_make()
and take around 5 minutes. Scenario 3 and 4 hang (or maybe just take a huge amount of time and I wasn't willing to wait) and I cancelled these after 15+ minutes of waiting. I posted the literalreprex::reprex
code for 3, 4, and 5 because they never completed. Scenario 5 moved the data to a local directory and this did not make any difference;tar_make_clustermq()
still hangs.Simulate Data
Here is my attempt at a reprex. I simulated some data that is comparable in size and structure:
Benchmark interactive loading
First, benchmarking simple interactive loading of the data set into
R
using two common packages1. Reprex using
{readr}
andtar_make()
2. Reprex using
{vroom}
andtar_make()
3. Reprex using
{readr}
andtar_make_clustermq()
4. Reprex using
{vroom}
andtar_make_clustermq()
5. Reprex using
{readr}
,tar_make_clustermq()
, and data stored locallyThe text was updated successfully, but these errors were encountered: