Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap to LKJCholesky #1870

Closed
PavanChaggar opened this issue Aug 5, 2022 · 23 comments
Closed

Roadmap to LKJCholesky #1870

PavanChaggar opened this issue Aug 5, 2022 · 23 comments

Comments

@PavanChaggar
Copy link

PavanChaggar commented Aug 5, 2022

Making an issue to document the current issues hindering efficient LKJCholesky sampling and try to form a concrete path toward an implementation.

@storopoli has a nice example here with a work around and an ideal Turing implementation. I'll work on adapting this into a MWE that can be used for testing.

The LKJCholesky distribution has now been implemented in Distributions.jl (here, we just need to clean up how it interacts with Turing. There are two main issues:

  • Random samples from a LKJCholesky distribution return a LinearAlgebra.Cholesky type, not just an upper/lower triangular Array. From previous discussions, it sounds like Bijectors.jl needs the same input/ouput type of a Bijector. At the moment, the current CorrBijector maps to a correlation matrix but it would be easier to map to a Cholesky matrix.
  • There are some internals of Turing that will need to be made compatible with the Cholesky type. One known example is packaging samples into MCMCChains. @torfjelde @yebai @devmotion are there any other particular methods that would need to be implemented on a distribution type for Turing?

cc'ing @sethaxen, with whom I've briefly chatted about this.

Immediate next steps will be to work on a MWE to centre a PR around. However, if someone already has one, please post!

@joshualeond
Copy link

joshualeond commented Aug 8, 2022

@PavanChaggar Great to see more steps in this direction! I put together a LKJCholesky MWE just the other day and posted it to Discourse because I was running into some issues during sampling. @sethaxen answered my questions and it may be a good example for dealing with divergences and thus zero values in the diagonal of the Cholesky factor.

@yebai
Copy link
Member

yebai commented Nov 12, 2022

@PavanChaggar Any updates on this?

@ParadaCarleton
Copy link
Member

There are some internals of Turing that will need to be made compatible with the Cholesky type. One known example is packaging samples into MCMCChains.

If we switch to using InferenceData and InferenceObjects.jl, I believe this should get a lot easier. @PavanChaggar how long do you think it would take to get everything besides that working?

@SebastianCallh
Copy link

Hello! Just wanted to ask if there has been any progress on this?

@yebai
Copy link
Member

yebai commented Mar 15, 2023

Once TuringLang/Bijectors.jl#246 and TuringLang/DynamicPPL.jl#462 are merged, this should be supported.

cc @torfjelde

@joshualeond
Copy link

Hey @yebai and @torfjelde, just checking in to see if this is implemented in Turing? I see those two PRs are merged but wasn't sure if that meant it's complete or not. If so, are there any docs or examples to play with related to this?

@torfjelde
Copy link
Member

This is the PR to follow: #2018

Once that has gone through, we're there:) And we're getting closer!

@yebai
Copy link
Member

yebai commented Aug 16, 2023

This should be working now.

@yebai yebai closed this as completed Aug 16, 2023
@joshualeond
Copy link

Thanks so much @yebai for you and the Turing team's effort! I see adding support for this has not been a trivial task.

Unfortunately, I'm personally still having problems using the LKJCholesky distribution in Turing. I posted on Julia Discourse about it but unsure what the best place is to discuss. I'm assuming it isn't Github though. Could just be user error on my part.

@yebai
Copy link
Member

yebai commented Sep 4, 2023

@torfjelde fixed a few minor leftover issues, notably TuringLang/DynamicPPL.jl#521 and TuringLang/DynamicPPL.jl#531. @joshualeond can you try again using [email protected]?

@joshualeond
Copy link

Hey @yebai and @torfjelde, yes it looks good on my end:

using LinearAlgebra, PDMats, Random, Turing

Random.seed!(121)

# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
         0.3 1 0.1;
         0.2 0.1 1]

Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3

y = rand(MvNormal(Sigma), N)'

@model function correlation_chol(J, N, y)
    sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
    F ~ LKJCholesky(J, 1.0)
    Σ_L = Diagonal(sigma) * F.L
    Sigma = PDMat(Cholesky(Σ_L + eps() * I))
    for i in 1:N
        y[i,:] ~ MvNormal(Sigma)
    end
    return (;Omega = Matrix(F))
end


chain1 = sample(correlation_chol(J, N, y), NUTS(), 1000);

Results in the following below. I'm attempting to get that Omega matrix reconstructed with the posterior to compare it to the original Stan post I was attempting to replicate but I just need to get more familiar with Turing.

Thanks again for all the hard work on this! Giving much of your personal time to bring new features to Julia is appreciated so thanks.

julia> chain1
Chains MCMC chain (1000×21×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 2.01 seconds
Compute duration  = 2.01 seconds
parameters        = sigma[1], sigma[2], sigma[3], F.L[1,1], F.L[2,1], F.L[3,1], F.L[2,2], F.L[3,2], F.L[3,3]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64    Float64   Float64       Float64 

    sigma[1]    1.0154    0.0742    0.0026    840.1914   756.3558    0.9993      417.3827
    sigma[2]    2.2811    0.1633    0.0058    787.8339   809.2130    1.0005      391.3730
    sigma[3]    3.0124    0.2217    0.0070   1031.3962   633.4660    1.0016      512.3677
    F.L[1,1]    1.0000    0.0000       NaN         NaN        NaN       NaN           NaN
    F.L[2,1]    0.5011    0.0781    0.0029    704.6972   605.2379    0.9994      350.0731
    F.L[3,1]    0.2197    0.0884    0.0030    879.5019   843.2654    0.9996      436.9110
    F.L[2,2]    0.8607    0.0454    0.0017    704.6972   605.2379    0.9993      350.0731
    F.L[3,2]   -0.0145    0.0961    0.0027   1211.5411   721.9466    0.9996      601.8585
    F.L[3,3]    0.9664    0.0221    0.0008    852.8859   809.2761    0.9993      423.6890

@torfjelde
Copy link
Member

torfjelde commented Sep 5, 2023

Glad it's working!

I'm attempting to get that Omega matrix reconstructed with the posterior to compare it to the original Stan post I was attempting to replicate but I just need to get more familiar with Turing.

For this you can use generated_quantities(correlation_chol(J, N, y), chain1) :)

@yebai
Copy link
Member

yebai commented Sep 5, 2023

For this you can use generated_quantities(correlation_chol(J, N, y), chain1) :)

@torfjelde will this work as expected, given that we renamed variables from F to F.L for clarity in the returned chains? I saw a lot of warnings while running this example.

@torfjelde
Copy link
Member

Ah true, no you're right, this won't work as intended. Hmm, will have to think a bit about that.

@yebai
Copy link
Member

yebai commented Sep 12, 2023

Ah true, no you're right, this won't work as intended. Hmm, will have to think a bit about that.

After #2078, generated_quantities(correlation_chol(J, N, y), chain1) is now working properly on the master branch.

@joshualeond
Copy link

Nice! Just checked it out and looks good on my end as well. Thanks again for the huge effort!

@YSanchezAraujo
Copy link

YSanchezAraujo commented Sep 14, 2023

@joshualeond could you say what version of turing are you using? When I try your example I get an error

ERROR: MethodError: no method matching vectorize(::LKJCholesky{Float64}, ::Cholesky{Float64, Matrix{Float64}})

ah sorry!! I see now, I just needed to update to 0.29 , please ignore me.

@datadreamscorp
Copy link

Nice! Just checked it out and looks good on my end as well. Thanks again for the huge effort!

Hi, I am on Turing.jl 0.29.1 and the above code does not run for me. It throws the following error:
ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
Any help would be appreciated.

@YSanchezAraujo
Copy link

I also get the above ^ but it seems to happen at random.

@torfjelde
Copy link
Member

ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.

This is a different error than what @YSanchezAraujo first mentioned though; this might just be a numerical issue 😕

Need to look more into this a bit later

@datadreamscorp
Copy link

Sadly I have not been able to replicate the above example after several tries :( not even at random. The one time that I did not get the PosDefException, the model sampled and then proceeded to throw the following error:
ERROR: MethodError: no method matching length(::Cholesky{Float64, Matrix{Float64}})

Really appreciate the work that is being put into this. It's a much-needed implementation. Hope it can be fixed soon.

@torfjelde
Copy link
Member

This seems relevant: #2081

In particular, this comment: #2081 (comment)

@YSanchezAraujo
Copy link

it seems like the basic advice is to add some small value along the diagonal of Sigma, but how big that should be will differ from case to case. it makes me wonder if empirically it can be shown that the worse the model assumptions are for the data, the more likely you are to run into this problem. basically if you need to add too large of a value along the diagonal, just get rid of the attempt to model that correlation matrix all together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants