Error handling for DynamicNUTS #3

itsdfish · 2019-05-13T09:54:12Z

Currently, Turing crashes when DynamicNUTS produces a numerical error. If my memory serves me correctly, this happens in about 15 to 18% of cases. After looking through our preliminary code, this was solved by adding a try/catch statement to the inner loop of the benchmark code, which applied to all of the samplers. As we discussed, this is suboptimal because it is very inefficient and could also introduce a selection bias into the benchmark process, thereby invalidating the results. It would be good to either develop a better solution or request this feature on Turing.

goedman · 2019-05-13T12:39:58Z

Internally dHMC does the same in LogDensityRejectErrors(). My assumption is that this could causes the different slopes in the time plot we observe.

We might need to compare for bias in the benchmarks and add a graph for that. One model in StatisticalRethinking consistently shows a bias (m10.04d). This case was labeled a bug by Tamas. It is a multilevel model but useful as a test case for detecting bias.

I have actually often thought about creating chains using different mcmc tools. That should show us bias?

itsdfish · 2019-05-13T12:51:12Z

Interesting. Are you saying there is a bug that causes the errors, which then causes bias? Or is the problem poor error handling compared to Stan?

goedman · 2019-05-13T13:10:30Z

My understanding/assumption is that this is an issue in computing gradients. I think Stan, or the libraries Stan is using, might have more options/tools to deal with such errors.

If such an error would always occur in the same region, maybe it could lead to bias? What I have noticed in m10.04d is that signs and magnitudes often show a reasonable pattern, but are off.

goedman · 2019-05-13T15:37:11Z

I'll actually spend a few cycles on building a Chains object initially consisting of dHMC draws and CmdStan draws for m10.04 and see what that tells us.

itsdfish · 2019-07-21T09:35:13Z

I'm going to close this because DynamicNUTS will lose support in the future:

As a side note, Turing.jl is unlikely to keep maintaining the support of DynamicHMC.jl backend (DynamicNUTS) in the future as AdvancedHMC.jl becomes more and more mature.

Originally posted by @xukai92 in #25 (comment)

goedman · 2019-07-22T22:59:03Z

Fully agree with that. I think we should focus on Turing's AdvancedHMC, DynamicHMC and Stan, all on the NUTS variant of HMC.

I've been pretty busy doing major work on future Stan related packages and StatisticalRethinking (2nd edition).

As I am considering making DynamicHMC the primary platform for StatisticalRethinking, lots of exploring ...

itsdfish · 2019-07-22T23:32:04Z

That sounds interesting. I would like to learn about what you decide to do and why. I would like to fully switch to Turing or DynamicHMC but the performance is not scalable currently. On hierarchical models, even small ones, they hit a wall pretty hard. I'm hoping that we get a flexible reverse AD that is competitive with Stan.

goedman · 2019-07-23T00:13:48Z

Maybe I have misinterpreted some of the results in MCMCBenchmarks. All part of exploring ...

I haven't seen results for Hierarchical_Poisson. Which results are demonstrating your concern? (As I said before, you're miles ahead of me in that area!).

itsdfish · 2019-07-23T08:25:06Z

I observed these drastic differences only informally as I was developing and testing the hierarchical Poisson models. I just ran a benchmark to get a better idea of the performance differences. I scaled it down to five repetitions because it was so slow. Here are some of the results:

summary_time.pdf

DynamicHMC is 23 times slower than Stan with 100 observations while Turing is 127 times slower.

In this plot, you can see that run time is related to allocations. Note that CmdStan only consists of memory allocations in Julia.

scatter_allocations_time.pdf

Memory allocations are part of the problem and so is the use of forward AD. I'm not sure to what extent these problems are interrelated.

itsdfish · 2019-07-23T08:44:38Z

This weekend I would like to try running DynamicHMC with Zygote if possible. Would you like me to ping Taap? It might be useful for us to know whether we can scale up DynamicHMC right now.

goedman · 2019-07-23T19:54:41Z

Thanks Chris, the summary_time.pdf is very helpful. As is the allocation plot, but we can't compare those with Stan, but it might be telling for Turing (higher intercept?)

The slope below Ns=20 might be affected by startup issues. But if beyond that point AD is the determining factor it makes sense that DynamicHMC and AdvancedHMC show the same slope (both ForwardDiff). Clearly Stan's AD performance is still ahead. Is that how you interpret the summary plot?

itsdfish · 2019-07-24T09:48:51Z

Yeah. I think you are right. The intercepts do give us clues about the startup overhead. I wanted to measure that more directly by setting the data and NUTS sample low, but encountered some problems. So I will open an issue and fix that in the next couple days.

itsdfish · 2019-07-24T09:50:12Z

P.S. Tamas is going to push some bug fixes so we can try Zygote with DynamicHMC. I'll report on the Poisson model if Zygote runs.

tpapp · 2019-07-24T12:40:58Z

(continuing the discussion from tpapp/LogDensityProblems.jl#43 here)

In general, direct comparisons between Stan and other NUTS implementations are tricky, because the speed of density + gradient evaluations and the tuning can both differ. Stan is usually better tuned. Improving this is on the agenda for DynamicHMC, which I suspect has some bugs which surface with high curvature models. I hope to complete the rewrite this summer. This project provided a lot of test models, for which I cannot thank you enough.

Regarding the choice of AD engine within Julia: Zygote is amazing when it works, but can be tricky when it does not as the issues you noticed show. Both LogDensityProblems and TransformVariables are being gradually rewritten to be more friendly to Zygote, but this is a moving target.

If you provide a link to the concrete model, I can run a few tests (but I may need a few days for this).

That said, Stan is highly optimized (2.20.0 is yet another improvement on some fronts), and is not trivial to match at the moment in Julia.

itsdfish · 2019-07-24T12:54:20Z

Sorry. It was not clear to me where I should ask this question. Thanks for sharing this information and for your willingness to run a few tests. If you would like I can open a new issue for the hierarchical Poisson regression model in question. The code is here, but I can strip it down to the necessary components. In addition, I have notice that this model creates occasional NaN errors, which will cause problems when you deprecate the LogDensityRejectErrors function.

tpapp · 2019-07-24T13:05:43Z

No worries. I will look into this.

Regarding LogDensityRejectErrors and tpapp/LogDensityProblems.jl#45: I will think about how to keep it.

itsdfish · 2019-07-24T13:11:06Z

P.S. I have experienced the most NaN errors with 100 data points:

data = simulatePoisson(;Ns=100)
chain = sampleDHMC(data...,2000)

goedman · 2019-07-24T14:00:34Z

Just for completeness, in testing Zygote in DynamicHMCModels.jl I usually make the following update:

P = TransformedLogDensity(problem_transformation(p), p)
#∇P = LogDensityRejectErrors(ADgradient(:ForwardDiff, P));
import Zygote
#∇P = LogDensityRejectErrors(ADgradient(:Zygote, P));
∇P = ADgradient(:Zygote, P);

Both versions using ZygoteI run into:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/scripts/10/m10.04d.jl")
ERROR: LoadError: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got Tuple{String,String}
Stacktrace:
 [1] binomlogpdf at /Users/rob/.julia/packages/StatsFuns/2QE7p/src/rmath.jl:73 [inlined]
 [2] logpdf at /Users/rob/.julia/packages/Distributions/iHWHD/src/univariates.jl:535 [inlined]
 [3] #15 at /Users/rob/.julia/packages/Distributions/iHWHD/src/univariates.jl:510 [inlined]
 [4] #1206 at /Users/rob/.julia/packages/Zygote/jXbu4/src/lib/broadcast.jl:109 [inlined]
 [5] _broadcast_getindex_evalf at ./broadcast.jl:625 [inlined]

... <snipped>

 [32] m_10_04d_model at /Users/rob/.julia/dev/DynamicHMCModels/scripts/10/m10.04d.jl:43 [inlined]
 [33] _forward(::Zygote.Context, ::m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}, ::NamedTuple{(:β, :α),Tuple{Array{Float64,1},Array{Float64,1}}}) at /Users/rob/.julia/packages/Zygote/jXbu4/src/compiler/interface2.jl:0
 [34] transform_logdensity at /Users/rob/.julia/dev/TransformVariables/src/generic.jl:137 [inlined]

I've been playing around with both LogDensityProblems and TransformVariables to try and understand this error but haven't made much progress. Looking at the code has been very rewarding though!

tpapp · 2019-07-24T14:14:53Z

I am happy to look into issues like this, but I ask that you open an issue at LogDensityProblems.jl complete with all the relevant details (MWE, pkg> st --manifest, etc).

(Sorry, I just realized that I am continuing discussion in a closed issue. Feel free to move it elsewhere).

goedman · 2019-07-24T17:33:57Z

Thanks Tamas. I will certainly file an issue once I'm convinced the issue is not on our side. In this case I want to understand when I see the issue Chris reported (which I have also seen) and when I see this issue.

Chris for now I will reopen this issue.

goedman · 2019-07-24T22:18:35Z

Chris,

For Gaussian I get the following error when using Zygote:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/benchmarks/Gaussian/gaussian.jl")
ERROR: LoadError: MethodError: Cannot `convert` an object of type Array{Float64,1} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Ratios.SimpleRatio{S}) where {T<:AbstractFloat, S} at /Users/rob/.julia/packages/Ratios/iJ67w/src/Ratios.jl:16
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  ...

For e.g. Hierachical_Poisson I get a different error:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/benchmarks/Hierarchical_Poisson/hierachical_poisson.jl")
ERROR: LoadError: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got Tuple{String,String}
Stacktrace:
 [1] poislogpdf at /Users/rob/.julia/packages/StatsFuns/2QE7p/src/rmath.jl:73 [inlined]
 [2] logpdf at /Users/rob/.julia/packages/Distributions/iHWHD/src/univariates.jl:535 [inlined]
 [3] _forward(::Zygote.Context, ::typeof(logpdf), ::Poisson{Float64}, ::Int64) at /Users/rob/.julia/packages/Zygote/jXbu4/src/compiler/interface2.jl:0

Is that what you are seeing as well?

I've moved the MCMCBenchmarks.jl scripts to DynamicHMCModels.jl (in the folder benchmarks), something I wanted to do anyway. I'm about halfway doing the same to StanSample.jl (in the folder examples).

itsdfish · 2019-07-24T23:08:14Z

Yeah. I am experiencing the same errors on my machine. To the best of my knowledge, the TypeError for the Poisson regression model has not been reported. It looks like it might be coming from LogDensityProblems or Transformations.

goedman · 2019-07-25T03:14:32Z

Agreed. The updated LogDensityProblems allowed about half of the DynamicHMCModels to work without the LogDensityRejectErrors() safety net. ML models remain difficult. I’ll put together a MWE for the ccall issue.

tpapp · 2019-07-29T08:14:14Z

Thinking about this, I am not super-surprised that AD can't work though external calls C in the rmath library.

itsdfish · 2019-07-29T08:31:32Z

Interesting. Do you know why forward diff works with Distributions, but not Zygote? I wonder if there are any workarounds.

tpapp · 2019-07-29T08:46:32Z

Either Zygote has to be told about the derivative manually (see its manual), or poislogpdf has to be written in native Julia (this would be my preferred solution, I am not sure why they used an external library for the log pdf, Julia now has all the ingredients), an MWE is

using StatsFuns
import ForwardDiff, Zygote

f(x) = poislogpdf(x[1], x[2])
x = ones(2)
f(x)
ForwardDiff.gradient(f, x)      # works
Zygote.gradient(f, x)           # fails

It would be great if you could open an issue and pursue this. First I would check issues for Distributions and then perhaps open one, and take to Zygote from there if they are not open to a native version.

itsdfish · 2019-07-29T09:01:39Z

Yeah. Good idea. I'll look into this later today and open an issue. If it is something straightforward I might be able to implement a fix, but this is starting to venture into dangerous territory for me =)

tpapp · 2019-07-29T09:30:20Z

As I said, just defining a function in native Julia would be the best solution. It's mostly a one-liner.

itsdfish closed this as completed Jul 21, 2019

itsdfish mentioned this issue Jul 24, 2019

Zygote AD: Cannot convert Array{Float64,1} to Float64 tpapp/LogDensityProblems.jl#43

Closed

goedman reopened this Jul 24, 2019

itsdfish mentioned this issue Jul 29, 2019

Add support for logpdf of Poisson FluxML/Zygote.jl#281

Closed

itsdfish closed this as completed Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling for DynamicNUTS #3

Error handling for DynamicNUTS #3

itsdfish commented May 13, 2019

goedman commented May 13, 2019

itsdfish commented May 13, 2019

goedman commented May 13, 2019

goedman commented May 13, 2019

itsdfish commented Jul 21, 2019

goedman commented Jul 22, 2019

itsdfish commented Jul 22, 2019

goedman commented Jul 23, 2019

itsdfish commented Jul 23, 2019 •

edited

Loading

itsdfish commented Jul 23, 2019

goedman commented Jul 23, 2019

itsdfish commented Jul 24, 2019

itsdfish commented Jul 24, 2019

tpapp commented Jul 24, 2019

itsdfish commented Jul 24, 2019

tpapp commented Jul 24, 2019

itsdfish commented Jul 24, 2019

goedman commented Jul 24, 2019

tpapp commented Jul 24, 2019

goedman commented Jul 24, 2019

goedman commented Jul 24, 2019

itsdfish commented Jul 24, 2019

goedman commented Jul 25, 2019

tpapp commented Jul 29, 2019

itsdfish commented Jul 29, 2019

tpapp commented Jul 29, 2019

itsdfish commented Jul 29, 2019

tpapp commented Jul 29, 2019

Error handling for DynamicNUTS #3

Error handling for DynamicNUTS #3

Comments

itsdfish commented May 13, 2019

goedman commented May 13, 2019

itsdfish commented May 13, 2019

goedman commented May 13, 2019

goedman commented May 13, 2019

itsdfish commented Jul 21, 2019

goedman commented Jul 22, 2019

itsdfish commented Jul 22, 2019

goedman commented Jul 23, 2019

itsdfish commented Jul 23, 2019 • edited Loading

itsdfish commented Jul 23, 2019

goedman commented Jul 23, 2019

itsdfish commented Jul 24, 2019

itsdfish commented Jul 24, 2019

tpapp commented Jul 24, 2019

itsdfish commented Jul 24, 2019

tpapp commented Jul 24, 2019

itsdfish commented Jul 24, 2019

goedman commented Jul 24, 2019

tpapp commented Jul 24, 2019

goedman commented Jul 24, 2019

goedman commented Jul 24, 2019

itsdfish commented Jul 24, 2019

goedman commented Jul 25, 2019

tpapp commented Jul 29, 2019

itsdfish commented Jul 29, 2019

tpapp commented Jul 29, 2019

itsdfish commented Jul 29, 2019

tpapp commented Jul 29, 2019

itsdfish commented Jul 23, 2019 •

edited

Loading