-
Notifications
You must be signed in to change notification settings - Fork 27
/
debugging.qmd
740 lines (623 loc) · 28.7 KB
/
debugging.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
---
execute:
freeze: auto
---
```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```
```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(targets)
library(tarchetypes)
```
# Debugging pipelines {#debugging}
This chapter explains how to debug `targets` pipelines. The repository at <https://github.com/wlandau/targets-debug> has example R code. To practice the debugging techniques explained below, [download the code](https://github.com/wlandau/targets-debug/archive/refs/heads/main.zip) and step through the interactive R scripts `demo_small.R`, `demo_browser.R`, and `demo_workspace.R`. To ask for help using `targets`, please read the [help chapter](#help).
## Debugging in `targets` is different
R code is easiest to debug when it is interactive. In the R console or [RStudio IDE](https://www.rstudio.com/products/rstudio/), you have full control over the code and the objects in the environment, and you are free to dissect, tinker, and test until you find and fix the issue. However, a pipeline is the opposite of interactive. In `targets`, several layers of encapsulation and automation separate you from the code you want to debug:
* The pipeline runs in an external non-interactive [`callr::r()`](https://github.com/r-lib/callr) process where you cannot use the R console.
* Data management
* Environment management
* [High-performance computing](#hpc)
* Built-in error handling
Although these layers are essential for reproducibility and scale, you will need to cut through them in order to diagnose and solve issues in pipelines. This chapter explains how.
## Debugging example
The following pipeline simulates a repeated measures dataset and analyzes it with generalized least squares.
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(
packages = c("broom", "broom.mixed", "dplyr", "nlme", "tibble", "tidyr")
)
simulate_data <- function(units) {
tibble(unit = seq_len(units), factor = rnorm(n = units, mean = 3)) %>%
expand_grid(measurement = seq_len(4)) %>%
mutate(outcome = sqrt(factor) + rnorm(n()))
}
analyze_data <- function(data) {
gls(
model = outcome ~ factor,
data = data,
correlation = corSymm(form = ~ measurement | unit),
weights = varIdent(form = ~ 1 | measurement)
) %>%
tidy(conf.int = TRUE, conf.level = 0.95)
}
list(
tar_target(name = dataset1, command = simulate_data(100)),
tar_target(name = model, command = analyze_data(dataset1))
)
```
This pipeline has an error.
```{r, eval = FALSE}
# R console
tar_make()
#> • start target dataset1
#> • built target dataset1 [0.499 seconds]
#> • start target model
#> ✖ error target model
#> • end pipeline [0.631 seconds]
#> Warning messages:
#> 1: NaNs produced
#> 2: 1 targets produced warnings. Run tar_meta(fields = warnings, complete_only = TRUE) for the messages.
#> Error:
#> ! Error running targets::tar_make()
#> Target errors: targets::tar_meta(fields = error, complete_only = TRUE)
#> Tips: https://books.ropensci.org/targets/debugging.html
#> Last error: missing values in object
```
## Finish the pipeline anyway
Even if you hit an error, you can still finish the successful parts of the pipeline. The `error` argument of `tar_option_set()` and `tar_target()` tells each target what to do if it hits an error. For example, `tar_option_set(error = "null")` tells errored targets to return `NULL`. The output as a whole will not be correct or up to date, but the pipeline will finish so you can look at preliminary results. This is especially helpful with [dynamic branching](#dynamic).
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(
packages = c("broom", "broom.mixed", "dplyr", "nlme", "tibble", "tidyr"),
error = "null" # produce a result even if the target errored out.
)
# Functions etc...
```
```{r, eval = FALSE}
# R console
tar_make()
#> • start target dataset1
#> • built target dataset1 [0.657 seconds]
#> • start target model
#> ✖ error target model
#> ✖ error target model
#> • end pipeline [0.783 seconds]
#> Warning messages:
#> 1: NaNs produced
#> 2: 1 targets produced warnings. Run tar_meta(fields = warnings, complete_only = TRUE) for the messages.
# We do have a result for target {model}.
tar_read(model)
#> NULL
# But it is not up to date.
tar_outdated()
#> [1] "model"
```
## Error messages
Still, it is important to fix known errors. The metadata in `_targets/meta/meta` is a good place to start. It stores the most recent error and warning messages for each target. `tar_meta()` can retrieve these messages.
```{r, eval = FALSE}
# R console
tar_meta(fields = error, complete_only = TRUE)
#> # A tibble: 1 × 2
#> name error
#> <chr> <chr>
#> 1 model missing values in object
```
```{r, eval = FALSE}
# R console
tar_meta(fields = warnings, complete_only = TRUE)
#> # A tibble: 1 × 2
#> name warnings
#> <chr> <chr>
#> 1 dataset1 NaNs produced
```
It looks like missing values in the data are responsible for the error in the `model` target. Maybe this clue alone is enough to repair the code.[^missing] If not, read on.
[^missing]: You can fix the bug by either removing the missing values from the dataset or by setting `na.action = na.omit` in `gls()`.
## Debugging in functions
Most errors are come from custom user-defined functions like `simulate_data()` and `analyze_data()`. See if you can reproduce the error in the R console.
```{r, eval = FALSE}
# R console
library(targets)
library(tarchetypes)
# Restart your R session.
rstudioapi::restartSession()
# Loads globals like tar_option_set() packages, simulate_data(), and analyze_data():
tar_load_globals()
# Load the data that the target depends on.
tar_load(dataset1)
# Run the command of the errored target.
analyze_data(dataset1)
#> Error in na.fail.default(list(measurement = c(1L, 2L, 3L, 4L, 1L, 2L, :
#> missing values in object
```
If you see the same error here that you saw in the pipeline, then good! Now that you are in an interactive R session, all the [usual debugging techniques and tools](https://adv-r.hadley.nz/debugging.html) such as `debug()` and `browser()` can help you figure out how to fix your code, and you can exclude `targets` from the rest of the debugging process.
```{r, eval = FALSE}
# R console
debug(analyze_data)
analyze_data(dataset1)
#> debugging in: analyze_data(dataset)
#> ...
Browse[2]> anyNA(dataset$outcome) # Do I need to handle missing values?
#> [1] TRUE
```
In some cases, however, you may not see the original error:
```{r, eval = FALSE}
# R console
dataset <- simulate_data(100)
analyze_data(dataset)
#> # A tibble: 2 × 7
#> term estimate std.error statistic p.value conf.low conf.high
#> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.595 0.152 3.92 1.05e- 4 0.297 0.892
#> 2 factor 0.362 0.0476 7.61 2.04e-13 0.269 0.455
```
Above, the random number generator seed in your local session is different from the seed assigned to the target in the pipeline. The dataset from the pipeline has missing values, whereas the one in the local session does not.
If you cannot reproduce the error in an interactive R session, read on.
## System issues
If you see an error in the pipeline but not your local interactive R session, then the bug could be system-related. For example, [this issue](https://github.com/ropensci/targets/discussions/506) was originally reported as a bug in `targets` but actually turned out to be a bug in the [interaction between packages `renv` and `callr`](https://github.com/rstudio/renv/issues/773#issuecomment-973765196). (Remember, `targets` runs the pipeline in a `callr::r()` process.) To check for issues like that one, try running `tar_make()` with `callr_function = NULL` to avoid `callr` entirely:
```{r, eval = FALSE}
# R console
rstudioapi::restartSession() # Remove in-memory detritus.
targets::tar_make(callr_function = NULL)
```
In addition, try running your code with `callr` outside the pipeline:
```{r, eval = FALSE}
# R console
callr::r(
func = function() {
library(targets)
library(tarchetypes)
tar_load_globals()
dataset <- simulate_data(100)
analyze_data(dataset)
},
show = TRUE
)
```
Similarly, you can isolate many [high-performance computing](https://books.ropensci.org/targets/hpc.html) problems by directly invoking [`clustermq`](https://github.com/mschubert/clustermq) or [`future`](https://github.com/HenrikBengtsson/future). Examples:
```{r, eval = FALSE}
# R console with clustermq:
# see https://mschubert.github.io/clustermq/articles/userguide.html
options(clustermq.scheduler = "multiprocess")
clustermq::Q(
function(arg) {
library(targets)
library(tarchetypes)
tar_load_globals()
dataset <- simulate_data(100)
analyze_data(dataset)
},
arg = 1,
n_jobs = 1
)
```
```{r, eval = FALSE}
# R console with future:
# see https://future.futureverse.org
future::plan(future::multisession)
f <- future::future(
expr = {
library(targets)
library(tarchetypes)
tar_load_globals()
dataset <- simulate_data(100)
analyze_data(dataset)
},
seed = TRUE
)
future::value(f)
```
If you successfully reproduce the bug without using `targets`, then the problem becomes much smaller and much easier to solve. At that point, you can completely exclude `targets` from the rest of the debugging process.
## Pause the pipeline with `browser()`
Sometimes, you may still need to run the pipeline to find the problem. The following trick lets you pause the pipeline and tinker with a running target interactively:
1. Insert [`browser()`](https://adv-r.hadley.nz/debugging.html#browser) into the function that produces the error.
1. Restart your R session to remove detritus from memory.[^detritus]
1. Call `tar_make()` with `callr_function = NULL` to run the pipeline in your interactive session without launching a new `callr::r()` process.
1. Poke around until you find the bug.
[^detritus]: With `callr_function = NULL`, a messy local R environment can accidentally change the functions and objects that a target depends on, which can invalidate those targets and erase hard-earned results that were previously correct. This is why `targets` uses `callr` in the first place, and it is why `callr_function = NULL` is for debugging only. If you do need `callr_function = NULL`, please restart your R session first.
```{r, eval = FALSE}
# _targets.R
# ...
analyze_data <- function(data) {
browser() # Pause the pipeline here.
gls(
model = outcome ~ factor,
data = data,
correlation = corSymm(form = ~ measurement | unit),
weights = varIdent(form = ~ 1 | measurement)
) %>%
tidy(conf.int = TRUE, conf.level = 0.95)
}
# ...
```
```{r, eval = FALSE}
# R console
library(targets)
library(tarchetypes)
# Restart your R session.
rstudioapi::restartSession()
# Run the pipeline in your interactive R session (no callr process)
tar_make(callr_function = NULL)
#> ✔ skip target dataset1
#> • start target model
#> Called from: analyze_data(dataset1)
# Tinker with the R session to see if you can reproduce the error.
Browse[1]> model <- gls(
+ model = outcome ~ factor,
+ data = data,
+ correlation = corSymm(form = ~ measurement | unit),
+ weights = varIdent(form = ~ 1 | measurement)
+ )
#> Error in na.fail.default(list(measurement = c(1L, 2L, 3L, 4L, 1L, 2L, :
#> missing values in object
# Figure out what it would take to fix the error.
Browse[1]> model <- gls(
+ model = outcome ~ factor,
+ data = data,
+ correlation = corSymm(form = ~ measurement | unit),
+ weights = varIdent(form = ~ 1 | measurement),
+ na.action = na.omit
+ )
# Confirm that the bug is fixed.
Browse[1]> tidy(model, conf.int = TRUE, conf.level = 0.95)
#> # A tibble: 2 × 7
#> term estimate std.error statistic p.value conf.low conf.high
#> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.795 0.148 5.36 0.000000145 0.504 1.09
#> 2 factor 0.275 0.0466 5.92 0.00000000717 0.184 0.367
```
## Pause the pipeline with the `debug` option
It may be too tedious to comb through all targets with `browser()`. For example, what if the pipeline has hundreds of simulated datasets? The following pipeline simulates 100 datasets with 58 experimental units each and 100 datasets with 70 experimental units each. Each dataset is analyzed with `gls()`. `tar_map_rep()` from the `tarchetypes` package organizes this simulation structure and batches the replications for computational efficiency.
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(
packages = c("broom", "broom.mixed", "dplyr", "nlme", "tibble", "tidyr")
)
simulate_data <- function(units) {
tibble(unit = seq_len(units), factor = rnorm(units, mean = 3)) %>%
expand_grid(measurement = seq_len(4)) %>%
mutate(outcome = sqrt(factor) + rnorm(n()))
}
analyze_data <- function(data) {
gls(
model = outcome ~ factor,
data = data,
correlation = corSymm(form = ~ measurement | unit),
weights = varIdent(form = ~ 1 | measurement)
) %>%
tidy(conf.int = TRUE, conf.level = 0.95)
}
simulate_and_analyze_one_dataset <- function(units) {
data <- simulate_data(units)
analyze_data(data)
}
list(
tar_map_rep( # from the {tarchetypes} package
name = analysis,
command = simulate_and_analyze_one_dataset(units),
values = data.frame(units = c(58, 70)), # 2 data size scenarios.
names = all_of("units"), # The columns of values to use to name the targets.
batches = 20, # For each scenario, divide the 100 simulations into 20 dynamic branch targets.
reps = 5 # Each branch target (batch) runs simulate_and_analyze_one_dataset(n = 100) 5 times.
)
)
```
```{r, eval = FALSE}
# R console
tar_make()
#> ✔ skip target analysis_batch
#> • start branch analysis_58_550d992c
#> • built branch analysis_58_550d992c [1.2 seconds]
#> • start branch analysis_58_582bca0a
#> • built branch analysis_58_582bca0a [0.895 seconds]
#> • start branch analysis_58_f0ac3217
#> • built branch analysis_58_f0ac3217 [0.848 seconds]
#> • start branch analysis_58_35d814c0
#> ✖ error branch analysis_58_35d814c0
#> • end pipeline [3.56 seconds]
#> Warning messages:
#> 1: NaNs produced
#> 2: 1 targets produced warnings. Run tar_meta(fields = warnings, complete_only = TRUE) for the messages.
#> Error:
#> ! Error running targets::tar_make()
#> Target errors: targets::tar_meta(fields = error, complete_only = TRUE)
#> Tips: https://books.ropensci.org/targets/debugging.html
#> Last error: missing values in object
```
Remember, if you just want to see the results that succeeded, run the pipeline with `error = "null"` in `tar_option_set()`. This temporary workaround is especially helpful with so many simulations.
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(
packages = c("broom", "broom.mixed", "dplyr", "nlme", "tibble", "tidyr"),
error = "null"
)
# Functions etc...
```
```{r, eval = FALSE}
# R console
tar_make()
#> • start target analysis_batch
#> • built target analysis_batch [0.002 seconds]
#> • start branch analysis_58_550d992c
#> • built branch analysis_58_550d992c [1.252 seconds]
#> • start branch analysis_58_582bca0a
#> • built branch analysis_58_582bca0a [0.864 seconds]
#> • start branch analysis_58_f0ac3217
#> • built branch analysis_58_f0ac3217 [0.838 seconds]
#> • start branch analysis_58_35d814c0
#> ✖ error branch analysis_58_35d814c0
#> ✖ error branch analysis_58_35d814c0
#> • start branch analysis_58_e8c6aeab
#> • built branch analysis_58_e8c6aeab [0.774 seconds]
#> # More targets...
#> # ...
#> • start target analysis
#> • built target analysis [0.008 seconds]
#> • end pipeline [27.823 seconds]
# Read the simulations that succeeded.
tar_read(analysis)
#> # A tibble: 270 × 12
#> term estimate std.error statistic p.value conf.low conf.high units tar_batch tar_rep tar_seed tar_group
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 (Intercept) 0.758 0.234 3.24 0.00137 0.300 1.22 58 1 1 -633351515 1
#> 2 factor 0.306 0.0696 4.40 0.0000168 0.170 0.443 58 1 1 -633351515 1
#> 3 (Intercept) 0.524 0.229 2.28 0.0232 0.0745 0.974 58 1 2 -915590912 1
#> 4 factor 0.402 0.0708 5.68 0.0000000402 0.263 0.541 58 1 2 -915590912 1
#> 5 (Intercept) 0.797 0.195 4.08 0.0000624 0.414 1.18 58 1 3 1619222314 1
#> 6 factor 0.295 0.0611 4.82 0.00000257 0.175 0.414 58 1 3 1619222314 1
#> 7 (Intercept) 0.639 0.216 2.96 0.00336 0.216 1.06 58 1 4 825884824 1
#> 8 factor 0.376 0.0723 5.21 0.000000424 0.235 0.518 58 1 4 825884824 1
#> 9 (Intercept) 0.850 0.151 5.63 0.0000000521 0.554 1.15 58 1 5 2080314913 1
#> 10 factor 0.270 0.0507 5.31 0.000000252 0.170 0.369 58 1 5 2080314913 1
#> # … with 260 more rows
#> # ℹ Use `print(n = ...)` to see more rows
```
Now let's seriously debug this pipeline. If each call to `simulate_and_analyze_one_dataset()` takes a long time to run, then the first step is to set one rep per batch in `tar_map_rep()` while keeping the total number of reps the same. In other words, increase `batches` from 20 to 100 and decrease `reps` from 5 to 1.[^seed_batching]. Also remove the `units = 70` scenario because we can reproduce the error without it.
[^seed_batching]: In `tarchetypes` version `0.7.1.9000` and above, this re-batching will [not change the random number generator seed assigned to each call to `simulate_and_analyze_one_dataset()`](https://github.com/ropensci/tarchetypes/pull/113).
```{r, eval = FALSE}
# _targets.R
# packages, options, and functions...
list(
tar_map_rep(
name = analysis,
command = simulate_and_analyze_one_dataset(units),
values = data.frame(units = 58), # Remove the units = 70 scenario.
names = all_of("units"),
batches = 100, # 100 batches now
reps = 1 # 1 rep per batch now
)
)
```
```{r, eval = FALSE}
# R console
tar_make()
#> • start target analysis_batch
#> • built target analysis_batch [0.002 seconds]
#> • start branch analysis_58_550d992c
#> • built branch analysis_58_550d992c [0.502 seconds]
#> • start branch analysis_58_582bca0a
#> • built branch analysis_58_582bca0a [0.208 seconds]
#> • start branch analysis_58_f0ac3217
#> # More successful targets...
#> # ...
#> • start branch analysis_58_b59aa384
#> ✖ error branch analysis_58_b59aa384
#> • end pipeline [3.828 seconds]
#> Warning messages:
#> 1: NaNs produced
#> 2: 1 targets produced warnings. Run tar_meta(fields = warnings, complete_only = TRUE) for the messages.
#> Error:
#> ! Error running targets::tar_make()
#> Target errors: targets::tar_meta(fields = error, complete_only = TRUE)
#> Tips: https://books.ropensci.org/targets/debugging.html
#> Last error: missing values in object
```
Around 20 targets ran successfully, and target `analysis_58_b59aa384` hit an error. Let's interactively debug `analysis_58_b59aa384` without interfering with any other targets:
1. Set the `debug` option to `"analysis_58_b59aa384"` in `tar_option_set()`.
1. Optional: set `cue = tar_cue(mode = "never")` in `tar_option_set()` to force skip all targets except:
* `analysis_58_b59aa384` and other targets in the `debug` option.
* targets that do not already exist in the [metadata](#data).
* targets that set their own [cues](https://docs.ropensci.org/targets/reference/tar_cue.html).
1. Restart your R session to remove detritus from memory.
1. Run `tar_make(callr_function = NULL)`.[^interactive]
[^interactive]: You can also run `tar_make_clustermq(callr_function = NULL)` or `tar_make_future(callr_function = NULL)`. In either case, the target to debug will not run on a parallel worker even if you set `deployment = "worker"`.
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(
packages = c("broom", "broom.mixed", "dplyr", "nlme", "tibble", "tidyr"),
debug = "analysis_58_b59aa384", # Set the target you want to debug.
cue = tar_cue(mode = "never") # Force skip non-debugging outdated targets.
)
# Functions etc...
```
```{r, eval = FALSE}
# R console
library(targets)
# Restart your R session.
rstudioapi::restartSession()
# Run the pipeline in your interactive R session (no callr process)
tar_make(callr_function = NULL)
#>✔ skip target analysis_batch
#>✔ skip branch analysis_58_550d992c
#>✔ skip branch analysis_58_582bca0a
#>✔ skip branch analysis_58_f0ac3217
#>✔ skip branch analysis_58_35d814c0
#>✔ skip branch analysis_58_e8c6aeab
#>✔ skip branch analysis_58_18f79420
#>✔ skip branch analysis_58_1af19a55
#>✔ skip branch analysis_58_5e604f91
#>✔ skip branch analysis_58_5d2c5812
#>✔ skip branch analysis_58_729b7859
#>✔ skip branch analysis_58_d3899b7b
#>✔ skip branch analysis_58_2a182d3f
#>✔ skip branch analysis_58_5be362d3
#>✔ skip branch analysis_58_5d86137a
#>✔ skip branch analysis_58_a3562efd
#>✔ skip branch analysis_58_a6d57bfd
#>✔ skip branch analysis_58_f15d092c
#>✔ skip branch analysis_58_53efc7f5
#>• start branch analysis_58_b59aa384
#>• pause pipeline
#> debug target analysis_58_b59aa384
#>
#>ℹ You are now running an interactive debugger.
#> You can enter code and print objects as with the normal R console.
#> How to use: https://adv-r.hadley.nz/debugging.html#browser
#>
#>ℹ The debugger is poised to run the command of target analysis_58_b59aa384:
#>
#> tarchetypes::tar_rep_run(command = tarchetypes::tar_append_static_values(object = simulate_and_analyze_one_dataset(58),
#> values = list(units = 58)), batch = analysis_batch, reps = 1,
#> iteration = "vector")
#>
#>ℹ Tip: run debug(tarchetypes::tar_rep_run) and then enter "c"
#> to move the debugger inside function tarchetypes::tar_rep_run().
#> Then debug the function as you would normally (without {targets}).
#>Called from: eval(expr = expr, envir = envir)
Browse[1]>
```
At this point, we are in an interactive debugger again. Only this time, we quickly skipped straight to the target we want to debug. We can follow the advice in the prompt above, or we can tinker in other ways.
```{r, eval = FALSE}
# R console
# Jump to the function we want to debug.
Browse[1]> debug(analyze_data)
Browse[1]> c # Continue to the next breakpoint.
#> debugging in: analyze_data(data)
# Tinker with the R session to see if you can reproduce the error.
Browse[2]> model <- gls(
+ model = outcome ~ factor,
+ data = data,
+ correlation = corSymm(form = ~ measurement | unit),
+ weights = varIdent(form = ~ 1 | measurement)
+ )
#> Error in na.fail.default(list(measurement = c(1L, 2L, 3L, 4L, 1L, 2L, :
#> missing values in object
# Figure out what it would take to fix the error.
Browse[1]> model <- gls(
+ model = outcome ~ factor,
+ data = data,
+ correlation = corSymm(form = ~ measurement | unit),
+ weights = varIdent(form = ~ 1 | measurement),
+ na.action = na.omit
+ )
# Confirm that the bug is fixed.
Browse[1]> tidy(model, conf.int = TRUE, conf.level = 0.95)
#> # A tibble: 2 × 7
#> term estimate std.error statistic p.value conf.low conf.high
#> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.925 0.205 4.51 0.0000103 0.523 1.33
#> 2 factor 0.279 0.0646 4.32 0.0000230 0.153 0.406
```
## Workspaces
A workspace is a special file that helps locally reconstruct the environment of a target outside the pipeline. To demonstrate, consider a different version of the above example which saves the datasets and the models in different targets. We set `workspace_on_error = TRUE` in `tar_option_set()` so that each errored target proactively saves a workspace file.[^workspace_save]
[^workspace_save]: `tar_option_set()` also has a `workspaces` argument to let you choose which targets save workspace files, regardless of whether they hit errors.
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(
packages = c("broom", "broom.mixed", "dplyr", "nlme", "tibble", "tidyr"),
workspace_on_error = TRUE # Save a workspace file for a target that errors out.
)
simulate_data <- function(units) {
tibble(unit = seq_len(units), factor = rnorm(units, mean = 3)) %>%
expand_grid(measurement = seq_len(4)) %>%
mutate(outcome = sqrt(factor) + rnorm(n()))
}
analyze_data <- function(data) {
gls(
model = outcome ~ factor,
data = data,
correlation = corSymm(form = ~ measurement | unit),
weights = varIdent(form = ~ 1 | measurement)
) %>%
tidy(conf.int = TRUE, conf.level = 0.95)
}
list(
tar_target(rep, seq_len(100)),
tar_target(data, simulate_data(100), pattern = map(rep)),
tar_target(analysis, analyze_data(data), pattern = map(data))
)
```
```{r, eval = FALSE}
# R console
tar_make()
#> • start target rep
#> • built target rep [0.6 seconds]
#> • start branch data_c9beb7ca
#> • built branch data_c9beb7ca [0.015 seconds]
#> • start branch data_3658ffe6
#> • built branch data_3658ffe6 [0.009 seconds]
#> • start branch data_b786c6ae
#> # More data targets...
#> # ...
#> • built pattern data
#> • start branch analysis_3ccf0b08
#> • built branch analysis_3ccf0b08 [0.214 seconds]
#> • start branch analysis_0e85e530
#> • built branch analysis_0e85e530 [0.227 seconds]
#> • start branch analysis_03816a9f
#> # More analysis targets...
#> # ...
#> • start branch analysis_02de2921
#> • record workspace analysis_02de2921
#> ✖ error branch analysis_02de2921
#> • end pipeline [5.619 seconds]
#> There were 15 warnings (use warnings() to see them)
#> Error:
#> ! Error running targets::tar_make()
#> Target errors: targets::tar_meta(fields = error, complete_only = TRUE)
#> Tips: https://books.ropensci.org/targets/debugging.html
#> Last error: missing values in object
```
What went wrong with target `analysis_02de2921`? To find out, we load the workspace in an interactive session.
```{r, eval = FALSE}
# R console
# List the available workspaces.
tar_workspaces()
#> [1] "analysis_02de2921"
# Load the workspace.
tar_workspace(analysis_02de2921)
```
At this point, the global objects, functions, and upstream dependencies of target `analysis_02de2921` are in memory. In addition, the target's original random number generator seed is set.[^seed]
[^seed]: You can retrieve this seed with `tar_meta(names = analysis_02de2921, fields = seed)`. In the pipeline, `targets` sets this seed with `withr::with_seed()` just before running the target. However, other functions or target factories may set their own seeds. For example, `tarchetypes::tar_map_rep()` sets its own target seeds so they are [resilient to re-batching](https://github.com/ropensci/tarchetypes/pull/113). For more details on seeds, see the documentation of the `seed` argument of [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html#arguments).
```{r, eval = FALSE}
# R console
ls()
#> [1] "analyze_data" "data" "simulate_data"
```
With the data and functions in hand, you can reproduce the error locally.
```{r, eval = FALSE}
# R console
analyze_data(data)
#> Error in na.fail.default(list(measurement = c(1L, 2L, 3L, 4L, 1L, 2L, :
#> missing values in object
```
For more assistance, you can load the [traceback] from the workspace file.
```{r, eval = FALSE}
tar_traceback(analysis_02de2921)
# [1] "eval(expr = expr, envir = envir)"
# [2] "eval(expr = expr, envir = envir)"
# [3] "analyze_data(data)"
# [4] "gls(model = outcome ~ factor, data = data, correlation = corSymm(form = ~measurement | unit), weights = varIden"
# [5] "tidy(., conf.int = TRUE, conf.level = 0.95)"
# [6] "gls(model = outcome ~ factor, data = data, correlation = corSymm(form = ~measurement | unit), weights = varIden"
# [7] "do.call(model.frame, mfArgs)"
# [8] "(function (formula, ...) \nUseMethod(\"model.frame\"))(formula = ~measurement + unit + outcome + factor, data = li"
# [9] "model.frame.default(formula = ~measurement + unit + outcome + factor, data = list(c(1, 1, 1, 1, 2, 2, 2, 2, 3, "
# [10] "(function (object, ...) \nUseMethod(\"na.fail\"))(list(c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, "
# [11] "na.fail.default(list(c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2"
# [12] "stop(\"missing values in object\")"
# [13] ".handleSimpleError(function (condition) \n{\n state$error <- build_message(condition)\n state$traceback <- b"
# [14] "h(simpleError(msg, call))"
```