How do you generate the data for LBA? #32

goedman · 2019-07-12T20:01:57Z

Hi Chris, tried to figure this out but can't find it in Examples or Models.

Recently I had a short discussion with Stijn de Waele about how useful it is to run say 4 chains with 4 different observation data series. Have you thought about that? My answer was that it maybe can show some of the sensitivity w.r.t. the input data, but I'm not sure that is correct. Stijn suggested strongly to update all examples to by default use a single set of observations.

Would you be ok if I add the LBA example as a test to the new StanJulia/StanSample.jl package I'm working on? I would like to include full, multiple chains versions of all MCMCBenchmarks examples in fact.

In this case I am trying Michael Betancourt's request on LBA but step sizes do not to be affected that much.

itsdfish · 2019-07-12T21:02:07Z

Hey Rob. There is a function in Models/LBA/LBA_Models.jl called simulateLBA:

function simulateLBA(;Nd,v=[1.0,1.5,2.0],A=.8,k=.2,tau=.4,kwargs...)
    return (rand(LBA(ν=v,A=A,k=k,τ=tau),Nd)...,N=Nd,Nc=length(v))
end

This has default parameter values that can be changed in the options NamedTuple of Examples/LBA/LBA_Example.jl.

Please feel free to use any of the code, including the LBA, for your other projects. No problem at all!

Currently, we use a unique data set for each repetition, as the following function shows starting at line 67 of MCMCBenchmarks.jl

function benchmark!(samplers,results,csr̂,simulate,Nreps,chains;kwargs...)
    for rep in 1:Nreps
      data = simulate(;kwargs...)
      schains = Chains[]
      for s in samplers
         #do all the benchmarking and bookkeeping 
      end
      csr̂=cross_samplerRhat!(schains,csr̂;kwargs...)
    end
    return results,csr̂
end

I'm open to using the same dataset for each repetition (e.g. placing data = simulate(;kwargs...) outside the outer loop). However, without understanding his reasoning, it strikes me as risky. As you noted, one advantage of using different datasets is that it helps ensure that the results do not depend on idiosyncratic properties of the single dataset that you happen to draw. In other words, it should increase the generalizability of the results. Did he mention why we should use a single dataset?

goedman · 2019-07-12T21:28:16Z

Thank you. I should have recognized the simulateLBA function!

Stijn is probably right that in ‘real data’ cases you probably have a single set of observations but in our case I like the cross_sampler approach much better.

itsdfish · 2019-07-12T22:41:26Z

Speaking of cross sampler rhat, I was thinking about modifying the rhat procedure in the future once Turing becomes more performant (and we get performant reverse mode autodiff in Julia). In the past, we couldn't run multiple chains in parallel (e.g. multiple chains for the same sampler and dataset) without sacrificing our measures for memory allocation and garbage collection. We opted for cross sampler rhat as a compromise. Perhaps at some point we could run multiple chains serially (e.g. with map) per sampler and dataset with reasonable speed while obtaining a better rhat estimate. Currently, however, I think Turing is still a bit slow and both Turing and DynamicHMC are slow with Hierarchical models. Hopefully, Zygote or Capstan (if that is still a project) will mature so we can use reverse mode autodiff.

goedman · 2019-07-12T23:20:20Z

Yes, like earlier in this discussion, using an ensemble of observations (from an identical proces) might tell us, or at least warn us, about the sensitivity. With multi sampler rhat it might warn us about an issue in a particular sampler maybe. That would be another step forward for users of mcmc.

It doesn’t cover all cases of course. For several months now the ‘famous’ MLM m-10-04 gives a different answer in DynamicHMC than either Stan or Turing. And indeed multi sampler rhat shows a problem. Tamas has labeled this a bug in DynamicHMC, but I sometimes wonder if that model is somehow weirdly multimodal or unstable? Triggered by your simulateLBA answer and as this is the only model I have seen this discrepancy maybe I should construct a input data simulator for that model.

Interesting stuff!

goedman · 2019-07-12T23:23:40Z

Above, when you say “a better rhat estimate”, you mean per sampler?

itsdfish · 2019-07-13T09:24:08Z

Interesting indeed. I wonder what a surface plot of the joint posterior looks like for MLM m-10-04? It might reveal some pathological behavior, such as a flat or highly correlated region of the posterior that causes numerical problems.

I have encountered some unusual behavior in Turing with a model like the LBA. I need to check and report if I can reproduce it with the LBA. Here is the problem: the LBA is essentially the minimum of n distributions. If the data do not contain at least one observation per n distributions, and the priors are uniform, Turing produces a lot of gradient errors. I don't think Stan does, but I need to test this more methodically.

In any case, I think the multi-sampler rhat is unconventional, but as you noted, it is useful for our purposes. I plan to keep it even when map is added to run multiple chains per sampler. If I was more saavy with statistics, I might be able to derive an rhat hat partitions between sampler variation when there are multiple chains per sampler.

Regarding you second post, yes. I think run multiple chains per sampler via map would yield better rhat estimates for that sampler.

goedman closed this as completed Jul 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you generate the data for LBA? #32

How do you generate the data for LBA? #32

goedman commented Jul 12, 2019 •

edited

Loading

itsdfish commented Jul 12, 2019 •

edited

Loading

goedman commented Jul 12, 2019

itsdfish commented Jul 12, 2019

goedman commented Jul 12, 2019

goedman commented Jul 12, 2019

itsdfish commented Jul 13, 2019

How do you generate the data for LBA? #32

How do you generate the data for LBA? #32

Comments

goedman commented Jul 12, 2019 • edited Loading

itsdfish commented Jul 12, 2019 • edited Loading

goedman commented Jul 12, 2019

itsdfish commented Jul 12, 2019

goedman commented Jul 12, 2019

goedman commented Jul 12, 2019

itsdfish commented Jul 13, 2019

goedman commented Jul 12, 2019 •

edited

Loading

itsdfish commented Jul 12, 2019 •

edited

Loading