Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling for DynamicNUTS #3

Closed
itsdfish opened this issue May 13, 2019 · 28 comments
Closed

Error handling for DynamicNUTS #3

itsdfish opened this issue May 13, 2019 · 28 comments

Comments

@itsdfish
Copy link
Collaborator

Currently, Turing crashes when DynamicNUTS produces a numerical error. If my memory serves me correctly, this happens in about 15 to 18% of cases. After looking through our preliminary code, this was solved by adding a try/catch statement to the inner loop of the benchmark code, which applied to all of the samplers. As we discussed, this is suboptimal because it is very inefficient and could also introduce a selection bias into the benchmark process, thereby invalidating the results. It would be good to either develop a better solution or request this feature on Turing.

@goedman
Copy link
Member

goedman commented May 13, 2019

Internally dHMC does the same in LogDensityRejectErrors(). My assumption is that this could causes the different slopes in the time plot we observe.

We might need to compare for bias in the benchmarks and add a graph for that. One model in StatisticalRethinking consistently shows a bias (m10.04d). This case was labeled a bug by Tamas. It is a multilevel model but useful as a test case for detecting bias.

I have actually often thought about creating chains using different mcmc tools. That should show us bias?

@itsdfish
Copy link
Collaborator Author

Interesting. Are you saying there is a bug that causes the errors, which then causes bias? Or is the problem poor error handling compared to Stan?

@goedman
Copy link
Member

goedman commented May 13, 2019

My understanding/assumption is that this is an issue in computing gradients. I think Stan, or the libraries Stan is using, might have more options/tools to deal with such errors.

If such an error would always occur in the same region, maybe it could lead to bias? What I have noticed in m10.04d is that signs and magnitudes often show a reasonable pattern, but are off.

@goedman
Copy link
Member

goedman commented May 13, 2019

I'll actually spend a few cycles on building a Chains object initially consisting of dHMC draws and CmdStan draws for m10.04 and see what that tells us.

@itsdfish
Copy link
Collaborator Author

I'm going to close this because DynamicNUTS will lose support in the future:

As a side note, Turing.jl is unlikely to keep maintaining the support of DynamicHMC.jl backend (DynamicNUTS) in the future as AdvancedHMC.jl becomes more and more mature.

Originally posted by @xukai92 in #25 (comment)

@goedman
Copy link
Member

goedman commented Jul 22, 2019

Fully agree with that. I think we should focus on Turing's AdvancedHMC, DynamicHMC and Stan, all on the NUTS variant of HMC.

I've been pretty busy doing major work on future Stan related packages and StatisticalRethinking (2nd edition).

As I am considering making DynamicHMC the primary platform for StatisticalRethinking, lots of exploring ...

@itsdfish
Copy link
Collaborator Author

That sounds interesting. I would like to learn about what you decide to do and why. I would like to fully switch to Turing or DynamicHMC but the performance is not scalable currently. On hierarchical models, even small ones, they hit a wall pretty hard. I'm hoping that we get a flexible reverse AD that is competitive with Stan.

@goedman
Copy link
Member

goedman commented Jul 23, 2019

Maybe I have misinterpreted some of the results in MCMCBenchmarks. All part of exploring ...

I haven't seen results for Hierarchical_Poisson. Which results are demonstrating your concern? (As I said before, you're miles ahead of me in that area!).

@itsdfish
Copy link
Collaborator Author

itsdfish commented Jul 23, 2019

I observed these drastic differences only informally as I was developing and testing the hierarchical Poisson models. I just ran a benchmark to get a better idea of the performance differences. I scaled it down to five repetitions because it was so slow. Here are some of the results:

summary_time.pdf

DynamicHMC is 23 times slower than Stan with 100 observations while Turing is 127 times slower.

In this plot, you can see that run time is related to allocations. Note that CmdStan only consists of memory allocations in Julia.

scatter_allocations_time.pdf

Memory allocations are part of the problem and so is the use of forward AD. I'm not sure to what extent these problems are interrelated.

@itsdfish
Copy link
Collaborator Author

This weekend I would like to try running DynamicHMC with Zygote if possible. Would you like me to ping Taap? It might be useful for us to know whether we can scale up DynamicHMC right now.

@goedman
Copy link
Member

goedman commented Jul 23, 2019

Thanks Chris, the summary_time.pdf is very helpful. As is the allocation plot, but we can't compare those with Stan, but it might be telling for Turing (higher intercept?)

The slope below Ns=20 might be affected by startup issues. But if beyond that point AD is the determining factor it makes sense that DynamicHMC and AdvancedHMC show the same slope (both ForwardDiff). Clearly Stan's AD performance is still ahead. Is that how you interpret the summary plot?

@itsdfish
Copy link
Collaborator Author

Yeah. I think you are right. The intercepts do give us clues about the startup overhead. I wanted to measure that more directly by setting the data and NUTS sample low, but encountered some problems. So I will open an issue and fix that in the next couple days.

@itsdfish
Copy link
Collaborator Author

P.S. Tamas is going to push some bug fixes so we can try Zygote with DynamicHMC. I'll report on the Poisson model if Zygote runs.

@tpapp
Copy link

tpapp commented Jul 24, 2019

(continuing the discussion from tpapp/LogDensityProblems.jl#43 here)

In general, direct comparisons between Stan and other NUTS implementations are tricky, because the speed of density + gradient evaluations and the tuning can both differ. Stan is usually better tuned. Improving this is on the agenda for DynamicHMC, which I suspect has some bugs which surface with high curvature models. I hope to complete the rewrite this summer. This project provided a lot of test models, for which I cannot thank you enough.

Regarding the choice of AD engine within Julia: Zygote is amazing when it works, but can be tricky when it does not as the issues you noticed show. Both LogDensityProblems and TransformVariables are being gradually rewritten to be more friendly to Zygote, but this is a moving target.

If you provide a link to the concrete model, I can run a few tests (but I may need a few days for this).

That said, Stan is highly optimized (2.20.0 is yet another improvement on some fronts), and is not trivial to match at the moment in Julia.

@itsdfish
Copy link
Collaborator Author

Sorry. It was not clear to me where I should ask this question. Thanks for sharing this information and for your willingness to run a few tests. If you would like I can open a new issue for the hierarchical Poisson regression model in question. The code is here, but I can strip it down to the necessary components. In addition, I have notice that this model creates occasional NaN errors, which will cause problems when you deprecate the LogDensityRejectErrors function.

@tpapp
Copy link

tpapp commented Jul 24, 2019

No worries. I will look into this.

Regarding LogDensityRejectErrors and tpapp/LogDensityProblems.jl#45: I will think about how to keep it.

@itsdfish
Copy link
Collaborator Author

P.S. I have experienced the most NaN errors with 100 data points:

data = simulatePoisson(;Ns=100)
chain = sampleDHMC(data...,2000)

@goedman
Copy link
Member

goedman commented Jul 24, 2019

Just for completeness, in testing Zygote in DynamicHMCModels.jl I usually make the following update:

P = TransformedLogDensity(problem_transformation(p), p)
#∇P = LogDensityRejectErrors(ADgradient(:ForwardDiff, P));
import Zygote
#∇P = LogDensityRejectErrors(ADgradient(:Zygote, P));
∇P = ADgradient(:Zygote, P);

Both versions using ZygoteI run into:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/scripts/10/m10.04d.jl")
ERROR: LoadError: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got Tuple{String,String}
Stacktrace:
 [1] binomlogpdf at /Users/rob/.julia/packages/StatsFuns/2QE7p/src/rmath.jl:73 [inlined]
 [2] logpdf at /Users/rob/.julia/packages/Distributions/iHWHD/src/univariates.jl:535 [inlined]
 [3] #15 at /Users/rob/.julia/packages/Distributions/iHWHD/src/univariates.jl:510 [inlined]
 [4] #1206 at /Users/rob/.julia/packages/Zygote/jXbu4/src/lib/broadcast.jl:109 [inlined]
 [5] _broadcast_getindex_evalf at ./broadcast.jl:625 [inlined]

... <snipped>

 [32] m_10_04d_model at /Users/rob/.julia/dev/DynamicHMCModels/scripts/10/m10.04d.jl:43 [inlined]
 [33] _forward(::Zygote.Context, ::m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}, ::NamedTuple{(:β, :α),Tuple{Array{Float64,1},Array{Float64,1}}}) at /Users/rob/.julia/packages/Zygote/jXbu4/src/compiler/interface2.jl:0
 [34] transform_logdensity at /Users/rob/.julia/dev/TransformVariables/src/generic.jl:137 [inlined]

I've been playing around with both LogDensityProblems and TransformVariables to try and understand this error but haven't made much progress. Looking at the code has been very rewarding though!

@tpapp
Copy link

tpapp commented Jul 24, 2019

I am happy to look into issues like this, but I ask that you open an issue at LogDensityProblems.jl complete with all the relevant details (MWE, pkg> st --manifest, etc).

(Sorry, I just realized that I am continuing discussion in a closed issue. Feel free to move it elsewhere).

@goedman
Copy link
Member

goedman commented Jul 24, 2019

Thanks Tamas. I will certainly file an issue once I'm convinced the issue is not on our side. In this case I want to understand when I see the issue Chris reported (which I have also seen) and when I see this issue.

Chris for now I will reopen this issue.

@goedman goedman reopened this Jul 24, 2019
@goedman
Copy link
Member

goedman commented Jul 24, 2019

Chris,

For Gaussian I get the following error when using Zygote:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/benchmarks/Gaussian/gaussian.jl")
ERROR: LoadError: MethodError: Cannot `convert` an object of type Array{Float64,1} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Ratios.SimpleRatio{S}) where {T<:AbstractFloat, S} at /Users/rob/.julia/packages/Ratios/iJ67w/src/Ratios.jl:16
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  ...

For e.g. Hierachical_Poisson I get a different error:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/benchmarks/Hierarchical_Poisson/hierachical_poisson.jl")
ERROR: LoadError: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got Tuple{String,String}
Stacktrace:
 [1] poislogpdf at /Users/rob/.julia/packages/StatsFuns/2QE7p/src/rmath.jl:73 [inlined]
 [2] logpdf at /Users/rob/.julia/packages/Distributions/iHWHD/src/univariates.jl:535 [inlined]
 [3] _forward(::Zygote.Context, ::typeof(logpdf), ::Poisson{Float64}, ::Int64) at /Users/rob/.julia/packages/Zygote/jXbu4/src/compiler/interface2.jl:0

Is that what you are seeing as well?

I've moved the MCMCBenchmarks.jl scripts to DynamicHMCModels.jl (in the folder benchmarks), something I wanted to do anyway. I'm about halfway doing the same to StanSample.jl (in the folder examples).

@itsdfish
Copy link
Collaborator Author

Yeah. I am experiencing the same errors on my machine. To the best of my knowledge, the TypeError for the Poisson regression model has not been reported. It looks like it might be coming from LogDensityProblems or Transformations.

@goedman
Copy link
Member

goedman commented Jul 25, 2019

Agreed. The updated LogDensityProblems allowed about half of the DynamicHMCModels to work without the LogDensityRejectErrors() safety net. ML models remain difficult. I’ll put together a MWE for the ccall issue.

@tpapp
Copy link

tpapp commented Jul 29, 2019

Thinking about this, I am not super-surprised that AD can't work though external calls C in the rmath library.

@itsdfish
Copy link
Collaborator Author

Interesting. Do you know why forward diff works with Distributions, but not Zygote? I wonder if there are any workarounds.

@tpapp
Copy link

tpapp commented Jul 29, 2019

Either Zygote has to be told about the derivative manually (see its manual), or poislogpdf has to be written in native Julia (this would be my preferred solution, I am not sure why they used an external library for the log pdf, Julia now has all the ingredients), an MWE is

using StatsFuns
import ForwardDiff, Zygote

f(x) = poislogpdf(x[1], x[2])
x = ones(2)
f(x)
ForwardDiff.gradient(f, x)      # works
Zygote.gradient(f, x)           # fails

It would be great if you could open an issue and pursue this. First I would check issues for Distributions and then perhaps open one, and take to Zygote from there if they are not open to a native version.

@itsdfish
Copy link
Collaborator Author

Yeah. Good idea. I'll look into this later today and open an issue. If it is something straightforward I might be able to implement a fix, but this is starting to venture into dangerous territory for me =)

@tpapp
Copy link

tpapp commented Jul 29, 2019

As I said, just defining a function in native Julia would be the best solution. It's mostly a one-liner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants