-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap to LKJCholesky #1870
Comments
@PavanChaggar Great to see more steps in this direction! I put together a LKJCholesky MWE just the other day and posted it to Discourse because I was running into some issues during sampling. @sethaxen answered my questions and it may be a good example for dealing with divergences and thus zero values in the diagonal of the Cholesky factor. |
@PavanChaggar Any updates on this? |
If we switch to using |
Hello! Just wanted to ask if there has been any progress on this? |
Once TuringLang/Bijectors.jl#246 and TuringLang/DynamicPPL.jl#462 are merged, this should be supported. cc @torfjelde |
Hey @yebai and @torfjelde, just checking in to see if this is implemented in Turing? I see those two PRs are merged but wasn't sure if that meant it's complete or not. If so, are there any docs or examples to play with related to this? |
This is the PR to follow: #2018 Once that has gone through, we're there:) And we're getting closer! |
This should be working now. |
Thanks so much @yebai for you and the Turing team's effort! I see adding support for this has not been a trivial task. Unfortunately, I'm personally still having problems using the LKJCholesky distribution in Turing. I posted on Julia Discourse about it but unsure what the best place is to discuss. I'm assuming it isn't Github though. Could just be user error on my part. |
@torfjelde fixed a few minor leftover issues, notably TuringLang/DynamicPPL.jl#521 and TuringLang/DynamicPPL.jl#531. @joshualeond can you try again using |
Hey @yebai and @torfjelde, yes it looks good on my end: using LinearAlgebra, PDMats, Random, Turing
Random.seed!(121)
# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
0.3 1 0.1;
0.2 0.1 1]
Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3
y = rand(MvNormal(Sigma), N)'
@model function correlation_chol(J, N, y)
sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
F ~ LKJCholesky(J, 1.0)
Σ_L = Diagonal(sigma) * F.L
Sigma = PDMat(Cholesky(Σ_L + eps() * I))
for i in 1:N
y[i,:] ~ MvNormal(Sigma)
end
return (;Omega = Matrix(F))
end
chain1 = sample(correlation_chol(J, N, y), NUTS(), 1000); Results in the following below. I'm attempting to get that Thanks again for all the hard work on this! Giving much of your personal time to bring new features to Julia is appreciated so thanks. julia> chain1
Chains MCMC chain (1000×21×1 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 1
Samples per chain = 1000
Wall duration = 2.01 seconds
Compute duration = 2.01 seconds
parameters = sigma[1], sigma[2], sigma[3], F.L[1,1], F.L[2,1], F.L[3,1], F.L[2,2], F.L[3,2], F.L[3,3]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
sigma[1] 1.0154 0.0742 0.0026 840.1914 756.3558 0.9993 417.3827
sigma[2] 2.2811 0.1633 0.0058 787.8339 809.2130 1.0005 391.3730
sigma[3] 3.0124 0.2217 0.0070 1031.3962 633.4660 1.0016 512.3677
F.L[1,1] 1.0000 0.0000 NaN NaN NaN NaN NaN
F.L[2,1] 0.5011 0.0781 0.0029 704.6972 605.2379 0.9994 350.0731
F.L[3,1] 0.2197 0.0884 0.0030 879.5019 843.2654 0.9996 436.9110
F.L[2,2] 0.8607 0.0454 0.0017 704.6972 605.2379 0.9993 350.0731
F.L[3,2] -0.0145 0.0961 0.0027 1211.5411 721.9466 0.9996 601.8585
F.L[3,3] 0.9664 0.0221 0.0008 852.8859 809.2761 0.9993 423.6890 |
Glad it's working!
For this you can use |
@torfjelde will this work as expected, given that we renamed variables from |
Ah true, no you're right, this won't work as intended. Hmm, will have to think a bit about that. |
After #2078, |
Nice! Just checked it out and looks good on my end as well. Thanks again for the huge effort! |
@joshualeond could you say what version of turing are you using? When I try your example I get an error
ah sorry!! I see now, I just needed to update to |
Hi, I am on Turing.jl |
I also get the above ^ but it seems to happen at random. |
This is a different error than what @YSanchezAraujo first mentioned though; this might just be a numerical issue 😕 Need to look more into this a bit later |
Sadly I have not been able to replicate the above example after several tries :( not even at random. The one time that I did not get the Really appreciate the work that is being put into this. It's a much-needed implementation. Hope it can be fixed soon. |
This seems relevant: #2081 In particular, this comment: #2081 (comment) |
it seems like the basic advice is to add some small value along the diagonal of Sigma, but how big that should be will differ from case to case. it makes me wonder if empirically it can be shown that the worse the model assumptions are for the data, the more likely you are to run into this problem. basically if you need to add too large of a value along the diagonal, just get rid of the attempt to model that correlation matrix all together. |
Making an issue to document the current issues hindering efficient
LKJCholesky
sampling and try to form a concrete path toward an implementation.@storopoli has a nice example here with a work around and an ideal Turing implementation. I'll work on adapting this into a MWE that can be used for testing.
The
LKJCholesky
distribution has now been implemented in Distributions.jl (here, we just need to clean up how it interacts with Turing. There are two main issues:LKJCholesky
distribution return aLinearAlgebra.Cholesky
type, not just an upper/lower triangularArray
. From previous discussions, it sounds likeBijectors.jl
needs the same input/ouput type of a Bijector. At the moment, the currentCorrBijector
maps to a correlation matrix but it would be easier to map to aCholesky
matrix.Cholesky
type. One known example is packaging samples into MCMCChains. @torfjelde @yebai @devmotion are there any other particular methods that would need to be implemented on a distribution type for Turing?cc'ing @sethaxen, with whom I've briefly chatted about this.
Immediate next steps will be to work on a MWE to centre a PR around. However, if someone already has one, please post!
The text was updated successfully, but these errors were encountered: