Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel-safe RNG #331

Closed
HenrikBengtsson opened this issue Nov 24, 2020 · 4 comments · Fixed by #349
Closed

Parallel-safe RNG #331

HenrikBengtsson opened this issue Nov 24, 2020 · 4 comments · Fixed by #349
Labels
feature a feature request or enhancement

Comments

@HenrikBengtsson
Copy link

Hi, I think you want to use doRNG::%dorng% instead of just %dopar% to make sure you use the L'Ecuyer RNG in order to get statistically sound random numbers for the parallel processing.

You can detect this by running:

library(doFuture)
registerDoFuture()
plan(multicore)  ## same as doMC()
example("tune_grid", package="tune")

which warns about this:

> warnings()
Warning messages:
1: UNRELIABLE VALUE: Future ('doFuture-1') unexpectedly generated random numbers without specifying argument
'[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results
might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, 
parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check,
use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".
...

When using %dorng%, the doRNG package will set up the L'Ecuyer RNG for you and these warnings go away.

PS. I'm working on making those RNG warnings more informative when using foreach, e.g. to have them refer to doRNG::%dorng% rather than the non-existing 'seed' argument

HenrikBengtsson added a commit to HenrikBengtsson/tune that referenced this issue Nov 24, 2020
@HenrikBengtsson
Copy link
Author

HenrikBengtsson commented Nov 24, 2020

PR #332 solves this;

library(doFuture)
options(future.rng.onMisuse = "error") ## be strict; escalate RNG warnings to errors
registerDoFuture()
plan(multicore)  ## same as doMC()
example("tune_grid", package="tune")

FWIW, even if you don't use it, you need to keep importing %dopar% due to the renozao/doRNG#18 bug.

@topepo
Copy link
Member

topepo commented Dec 1, 2020

The existing, albeit low-tech solution, is to set a collection of reproduce seeds inside the functions called by the workers. Our tests show that this gives reproducible results across a bunch of backends (and sequentially).

Are there cases where this would fail?

I don't doubt that this is a more appropriate method for controlling randomness in the workers.

The issue for tune is that it has a fair amount of imports and we've been working to decrease that due to CRAN's (silly) limit on the number of imports. If we add doRNG, we are again at the limit. We plan on adding more direct use of future (in addition to the existing foreach code) and doRNG would take up the slot that we've been saving for future.

@HenrikBengtsson
Copy link
Author

HenrikBengtsson commented Dec 1, 2020

The existing, albeit low-tech solution, is to set a collection of reproduce seeds inside the functions called by the workers. Our tests show that this gives reproducible results across a bunch of backends (and sequentially).

I didn't see that; it's good that you've got the reproducibility covered.

Are there cases where this would fail?

I don't doubt that this is a more appropriate method for controlling randomness in the workers.

Yeah, the latter would be my concern. I'm claiming to be an RNG expert but standing on the giants before us, I think using the built-in L'Ecuyer-CMRG RNG would be a better option. Given that you're already doing:

  if (rng) {
    seeds <- sample.int(10^5, nrow(resamples))
  } else {
    seeds <- NULL
  }

  ...

  if (!is.null(seeds)) {
    set.seed(seeds[[iteration]])
  }

you're almost there already. You could update to something like:

  if (rng) {
    okind <- RNGkind("L'Ecuyer-CMRG")
    on.exit(RNGkind(okind[1]), add = TRUE)
    seeds <- list(.Random.seed)
    for (i in seq_len(nrow(resamples)-1)) {
      seeds[[i+1]] <- parallel::nextRNGStream(seeds[[i]])
    }
  } else {
    seeds <- NULL
  }

  ...

  if (!is.null(seeds)) {
    assign(".Random.seed", seeds[[iteration]], envir = globalenv())
  }

The issue for tune is that it has a fair amount of imports and we've been working to decrease that due to CRAN's (silly) limit on the number of imports. If we add doRNG, we are again at the limit. We plan on adding more direct use of future (in addition to the existing foreach code) and doRNG would take up the slot that we've been saving for future.

Oh ... didn't think of that. There's probably room for a discussion around that; it probably helps cleaning up things around several houses but there are definitely use cases where it's just a blocker that introduces ad-hoc workarounds, e.g. Suggests instead of Imports, and unneeded re-exports. Not good and not productive.

A workaround for depending on 'doRNG' could be to list that under Imports: and move 'foreach' to Suggests:. Since 'doRNG' depends on 'foreach' you'd know that the 'Suggest:'ed 'foreach' will always be available, i.e. foreach::... will always work. I just realized that you might have to do a similar trick for parallel::nextRNGStream(); add Suggests: parallel, which is a package that comes with all R installations so it'll always be available. I see you have Imports: utils - I guess you can free up another slot there using the same trick.

I'll keep this limitation in mind when I develop the 'future' framework. But I agree, it's a silly limit and ideally it should affect the design.

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants