-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do you generate the data for LBA? #32
Comments
Hey Rob. There is a function in Models/LBA/LBA_Models.jl called simulateLBA:
This has default parameter values that can be changed in the options NamedTuple of Examples/LBA/LBA_Example.jl. Please feel free to use any of the code, including the LBA, for your other projects. No problem at all! Currently, we use a unique data set for each repetition, as the following function shows starting at line 67 of MCMCBenchmarks.jl
I'm open to using the same dataset for each repetition (e.g. placing |
Thank you. I should have recognized the simulateLBA function! Stijn is probably right that in ‘real data’ cases you probably have a single set of observations but in our case I like the cross_sampler approach much better. |
Speaking of cross sampler rhat, I was thinking about modifying the rhat procedure in the future once Turing becomes more performant (and we get performant reverse mode autodiff in Julia). In the past, we couldn't run multiple chains in parallel (e.g. multiple chains for the same sampler and dataset) without sacrificing our measures for memory allocation and garbage collection. We opted for cross sampler rhat as a compromise. Perhaps at some point we could run multiple chains serially (e.g. with |
Yes, like earlier in this discussion, using an ensemble of observations (from an identical proces) might tell us, or at least warn us, about the sensitivity. With multi sampler rhat it might warn us about an issue in a particular sampler maybe. That would be another step forward for users of mcmc. It doesn’t cover all cases of course. For several months now the ‘famous’ MLM m-10-04 gives a different answer in DynamicHMC than either Stan or Turing. And indeed multi sampler rhat shows a problem. Tamas has labeled this a bug in DynamicHMC, but I sometimes wonder if that model is somehow weirdly multimodal or unstable? Triggered by your simulateLBA answer and as this is the only model I have seen this discrepancy maybe I should construct a input data simulator for that model. Interesting stuff! |
Above, when you say “a better rhat estimate”, you mean per sampler? |
Interesting indeed. I wonder what a surface plot of the joint posterior looks like for MLM m-10-04? It might reveal some pathological behavior, such as a flat or highly correlated region of the posterior that causes numerical problems. I have encountered some unusual behavior in Turing with a model like the LBA. I need to check and report if I can reproduce it with the LBA. Here is the problem: the LBA is essentially the minimum of n distributions. If the data do not contain at least one observation per n distributions, and the priors are uniform, Turing produces a lot of gradient errors. I don't think Stan does, but I need to test this more methodically. In any case, I think the multi-sampler rhat is unconventional, but as you noted, it is useful for our purposes. I plan to keep it even when map is added to run multiple chains per sampler. If I was more saavy with statistics, I might be able to derive an rhat hat partitions between sampler variation when there are multiple chains per sampler. Regarding you second post, yes. I think run multiple chains per sampler via map would yield better rhat estimates for that sampler. |
Hi Chris, tried to figure this out but can't find it in Examples or Models.
Recently I had a short discussion with Stijn de Waele about how useful it is to run say 4 chains with 4 different observation data series. Have you thought about that? My answer was that it maybe can show some of the sensitivity w.r.t. the input data, but I'm not sure that is correct. Stijn suggested strongly to update all examples to by default use a single set of observations.
Would you be ok if I add the LBA example as a test to the new StanJulia/StanSample.jl package I'm working on? I would like to include full, multiple chains versions of all MCMCBenchmarks examples in fact.
In this case I am trying Michael Betancourt's request on LBA but step sizes do not to be affected that much.
The text was updated successfully, but these errors were encountered: