-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error handling for DynamicNUTS #3
Comments
Internally dHMC does the same in LogDensityRejectErrors(). My assumption is that this could causes the different slopes in the time plot we observe. We might need to compare for bias in the benchmarks and add a graph for that. One model in StatisticalRethinking consistently shows a bias (m10.04d). This case was labeled a bug by Tamas. It is a multilevel model but useful as a test case for detecting bias. I have actually often thought about creating chains using different mcmc tools. That should show us bias? |
Interesting. Are you saying there is a bug that causes the errors, which then causes bias? Or is the problem poor error handling compared to Stan? |
My understanding/assumption is that this is an issue in computing gradients. I think Stan, or the libraries Stan is using, might have more options/tools to deal with such errors. If such an error would always occur in the same region, maybe it could lead to bias? What I have noticed in m10.04d is that signs and magnitudes often show a reasonable pattern, but are off. |
I'll actually spend a few cycles on building a Chains object initially consisting of dHMC draws and CmdStan draws for m10.04 and see what that tells us. |
I'm going to close this because DynamicNUTS will lose support in the future: As a side note, Turing.jl is unlikely to keep maintaining the support of DynamicHMC.jl backend ( Originally posted by @xukai92 in #25 (comment) |
Fully agree with that. I think we should focus on Turing's AdvancedHMC, DynamicHMC and Stan, all on the NUTS variant of HMC. I've been pretty busy doing major work on future Stan related packages and StatisticalRethinking (2nd edition). As I am considering making DynamicHMC the primary platform for StatisticalRethinking, lots of exploring ... |
That sounds interesting. I would like to learn about what you decide to do and why. I would like to fully switch to Turing or DynamicHMC but the performance is not scalable currently. On hierarchical models, even small ones, they hit a wall pretty hard. I'm hoping that we get a flexible reverse AD that is competitive with Stan. |
Maybe I have misinterpreted some of the results in MCMCBenchmarks. All part of exploring ... I haven't seen results for Hierarchical_Poisson. Which results are demonstrating your concern? (As I said before, you're miles ahead of me in that area!). |
I observed these drastic differences only informally as I was developing and testing the hierarchical Poisson models. I just ran a benchmark to get a better idea of the performance differences. I scaled it down to five repetitions because it was so slow. Here are some of the results: DynamicHMC is 23 times slower than Stan with 100 observations while Turing is 127 times slower. In this plot, you can see that run time is related to allocations. Note that CmdStan only consists of memory allocations in Julia. Memory allocations are part of the problem and so is the use of forward AD. I'm not sure to what extent these problems are interrelated. |
This weekend I would like to try running DynamicHMC with Zygote if possible. Would you like me to ping Taap? It might be useful for us to know whether we can scale up DynamicHMC right now. |
Thanks Chris, the The slope below Ns=20 might be affected by startup issues. But if beyond that point AD is the determining factor it makes sense that DynamicHMC and AdvancedHMC show the same slope (both ForwardDiff). Clearly Stan's AD performance is still ahead. Is that how you interpret the summary plot? |
Yeah. I think you are right. The intercepts do give us clues about the startup overhead. I wanted to measure that more directly by setting the data and NUTS sample low, but encountered some problems. So I will open an issue and fix that in the next couple days. |
P.S. Tamas is going to push some bug fixes so we can try Zygote with DynamicHMC. I'll report on the Poisson model if Zygote runs. |
(continuing the discussion from tpapp/LogDensityProblems.jl#43 here) In general, direct comparisons between Stan and other NUTS implementations are tricky, because the speed of density + gradient evaluations and the tuning can both differ. Stan is usually better tuned. Improving this is on the agenda for DynamicHMC, which I suspect has some bugs which surface with high curvature models. I hope to complete the rewrite this summer. This project provided a lot of test models, for which I cannot thank you enough. Regarding the choice of AD engine within Julia: Zygote is amazing when it works, but can be tricky when it does not as the issues you noticed show. Both LogDensityProblems and TransformVariables are being gradually rewritten to be more friendly to Zygote, but this is a moving target. If you provide a link to the concrete model, I can run a few tests (but I may need a few days for this). That said, Stan is highly optimized (2.20.0 is yet another improvement on some fronts), and is not trivial to match at the moment in Julia. |
Sorry. It was not clear to me where I should ask this question. Thanks for sharing this information and for your willingness to run a few tests. If you would like I can open a new issue for the hierarchical Poisson regression model in question. The code is here, but I can strip it down to the necessary components. In addition, I have notice that this model creates occasional NaN errors, which will cause problems when you deprecate the LogDensityRejectErrors function. |
No worries. I will look into this. Regarding |
P.S. I have experienced the most NaN errors with 100 data points:
|
Just for completeness, in testing Zygote in DynamicHMCModels.jl I usually make the following update:
Both versions using ZygoteI run into:
I've been playing around with both LogDensityProblems and TransformVariables to try and understand this error but haven't made much progress. Looking at the code has been very rewarding though! |
I am happy to look into issues like this, but I ask that you open an issue at LogDensityProblems.jl complete with all the relevant details (MWE, (Sorry, I just realized that I am continuing discussion in a closed issue. Feel free to move it elsewhere). |
Thanks Tamas. I will certainly file an issue once I'm convinced the issue is not on our side. In this case I want to understand when I see the issue Chris reported (which I have also seen) and when I see this issue. Chris for now I will reopen this issue. |
Chris, For Gaussian I get the following error when using Zygote:
For e.g. Hierachical_Poisson I get a different error:
Is that what you are seeing as well? I've moved the MCMCBenchmarks.jl scripts to DynamicHMCModels.jl (in the folder |
Yeah. I am experiencing the same errors on my machine. To the best of my knowledge, the TypeError for the Poisson regression model has not been reported. It looks like it might be coming from LogDensityProblems or Transformations. |
Agreed. The updated LogDensityProblems allowed about half of the DynamicHMCModels to work without the LogDensityRejectErrors() safety net. ML models remain difficult. I’ll put together a MWE for the ccall issue. |
Thinking about this, I am not super-surprised that AD can't work though external calls C in the |
Interesting. Do you know why forward diff works with Distributions, but not Zygote? I wonder if there are any workarounds. |
Either Zygote has to be told about the derivative manually (see its manual), or using StatsFuns
import ForwardDiff, Zygote
f(x) = poislogpdf(x[1], x[2])
x = ones(2)
f(x)
ForwardDiff.gradient(f, x) # works
Zygote.gradient(f, x) # fails It would be great if you could open an issue and pursue this. First I would check issues for Distributions and then perhaps open one, and take to Zygote from there if they are not open to a native version. |
Yeah. Good idea. I'll look into this later today and open an issue. If it is something straightforward I might be able to implement a fix, but this is starting to venture into dangerous territory for me =) |
As I said, just defining a function in native Julia would be the best solution. It's mostly a one-liner. |
Currently, Turing crashes when DynamicNUTS produces a numerical error. If my memory serves me correctly, this happens in about 15 to 18% of cases. After looking through our preliminary code, this was solved by adding a try/catch statement to the inner loop of the benchmark code, which applied to all of the samplers. As we discussed, this is suboptimal because it is very inefficient and could also introduce a selection bias into the benchmark process, thereby invalidating the results. It would be good to either develop a better solution or request this feature on Turing.
The text was updated successfully, but these errors were encountered: