-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel-safe RNG #331
Comments
PR #332 solves this; library(doFuture)
options(future.rng.onMisuse = "error") ## be strict; escalate RNG warnings to errors
registerDoFuture()
plan(multicore) ## same as doMC()
example("tune_grid", package="tune") FWIW, even if you don't use it, you need to keep importing |
The existing, albeit low-tech solution, is to set a collection of reproduce seeds inside the functions called by the workers. Our tests show that this gives reproducible results across a bunch of backends (and sequentially). Are there cases where this would fail? I don't doubt that this is a more appropriate method for controlling randomness in the workers. The issue for |
I didn't see that; it's good that you've got the reproducibility covered.
Yeah, the latter would be my concern. I'm claiming to be an RNG expert but standing on the giants before us, I think using the built-in L'Ecuyer-CMRG RNG would be a better option. Given that you're already doing: if (rng) {
seeds <- sample.int(10^5, nrow(resamples))
} else {
seeds <- NULL
}
...
if (!is.null(seeds)) {
set.seed(seeds[[iteration]])
} you're almost there already. You could update to something like: if (rng) {
okind <- RNGkind("L'Ecuyer-CMRG")
on.exit(RNGkind(okind[1]), add = TRUE)
seeds <- list(.Random.seed)
for (i in seq_len(nrow(resamples)-1)) {
seeds[[i+1]] <- parallel::nextRNGStream(seeds[[i]])
}
} else {
seeds <- NULL
}
...
if (!is.null(seeds)) {
assign(".Random.seed", seeds[[iteration]], envir = globalenv())
}
Oh ... didn't think of that. There's probably room for a discussion around that; it probably helps cleaning up things around several houses but there are definitely use cases where it's just a blocker that introduces ad-hoc workarounds, e.g. Suggests instead of Imports, and unneeded re-exports. Not good and not productive. A workaround for depending on 'doRNG' could be to list that under I'll keep this limitation in mind when I develop the 'future' framework. But I agree, it's a silly limit and ideally it should affect the design. |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Hi, I think you want to use
doRNG::%dorng%
instead of just%dopar%
to make sure you use the L'Ecuyer RNG in order to get statistically sound random numbers for the parallel processing.You can detect this by running:
which warns about this:
When using
%dorng%
, the doRNG package will set up the L'Ecuyer RNG for you and these warnings go away.PS. I'm working on making those RNG warnings more informative when using foreach, e.g. to have them refer to
doRNG::%dorng%
rather than the non-existing 'seed' argumentThe text was updated successfully, but these errors were encountered: