-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HMC] Update random variables independent of the joint likelihood ? #372
Comments
It makes a lot of sense to do this - actually, this is something that @xukai92 and I thought about when designing the HMC sampler. Basically, we wanted a way to figure out which random variables in
@xukai92 do you have any ideas here? |
A random variable could still "be in the model" in the sense that it is sampled but not used after, meaning that the partial derivative of the joint with respect to that variable will only be its prior derivative. I think that we should keep them but not update them since in the case of HMC within Gibbs, such a variable could be "unused" (likelihood independent of that variable) at one iteration, but "used" in a later iteration. I don't know how we can detect these random variables in an efficient way. |
I'm not 100% sure I understand the situation when this is useful. Do you mean some of the dependency between likelihood and variables may disappear in some iterations because of stochastic conditions? |
Yes, for instance with a Dirichlet mixture process, a cluster can be empty at an iteration (no observation assigned to it), therefore the cluster's weight and location are unused. |
Emm it's interesting. It's detecting stochastic dependency dynamically instead of using the compiler to check the dependency, which was what we discussed before. Let me write a minimum example for further discussion @model test() begin
a ~ Normal(0, 1)
b ~ Normal(0, 1)
if a > 0
1.5 ~ Normal(a + b, 1)
else
1.5 ~ Normal(a, 1)
end
end
sample(test(), Gibbs(..., HMC(..., :b), PG(..., :a))) In this model, if in the I think your proposal will do the job. I am also thinking if we can do the same thing but separately tracking Also, I think we may be able to use |
That's a perfect test ! Good idea, in |
Oh yes, we can simply use One minor concern is that in this approach, we do more additions than before. I don't know to what extent it will slow us down but we can separate them into two fields in |
Oh yes I mean |
The reason I would prefer split Let's see how @yebai thinks about this design! |
@xukai92 I'm working on a PR, I like your solution ! |
Maybe keep it as |
Is there no computational overhead by using |
There is but I think that's not significant (depending on our previous profiling experience) |
I have just realised that what we are currently doing is sampling these "unused" random variables from their priors, since HMC is sampling from the joint (which is proportional to the posterior) and the joint reduces to the prior for these random variables. Therefore I'm not sure whether that's an issue or not. |
PR #401 |
@xukai92 Actually this solution cannot work since a random variable
|
I'm a bit lost here. Do you mean we shouldn't update |
In that example |
I see. You are right. We probably still need to do that in model compilation time..., in which we can extract the variable dependency somehow. But again this way we cannot deal with the stochastic control flow easily. I don't know if we can use Cassette.jl for run-time stochastic dependency check when it's mature. |
Indeed accessing the underlying (dependencies) graph could be a way. I'm trying to think about an other way. |
What's the alternative in your mind? |
Maybe to check this independence during each |
@emilemathieu @xukai92 I'm hoping that we will eventually be able to use |
In
HMC
during a leapfrog step, all variables are updated (viap -= ϵ * grad / 2; θ += ϵ * p
).Should't we avoid updating random variables which do not affect the joint probability ? Otherwise, we might be moving these random variables to one of their prior's modes.
We could detect those variables since their partial derivative (i.e. the associated component in
grad = gradient(vi, model, spl)
) is equal to the derivative of their prior. For instance by doing the following:This would be useful (but correct ?) for a BNP mixture model to avoid updating the used clusters' weight and location.
We could otherwise resample these variables from their prior
The text was updated successfully, but these errors were encountered: