Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELLA #53

Open
murphyk opened this issue May 3, 2023 · 0 comments
Open

ELLA #53

murphyk opened this issue May 3, 2023 · 0 comments
Assignees
Labels

Comments

@murphyk
Copy link
Member

murphyk commented May 3, 2023

Z. Deng, F. Zhou, and J. Zhu, “Accelerated Linearized Laplace Approximation for Bayesian Deep Learning,” in Advances in Neural Information Processing Systems, Oct. 2022 [Online]. Available: https://openreview.net/forum?id=jftNpltMgz. [Accessed: Apr. 29, 2023]
https://github.com/thudzj/ELLA

That is, they do the same thing for FCEKF. Additional elements of the Lofi version:
(1) W acts as pseudo-observations, compressing the observations in J.
(2) Upsilon rescales the kernel, modifying the base NTK.

They use the GP formulation to motivate a low-rank approximation that is complementary to ours, compressing along the model parameters (P) rather than along the training set (NC). In other words, the true posterior precision equals J' J where J is the NCxP matrix that concatenates Jacobians from all training examples (for simplicity I'm absorbing outcome covariance, our R or their Lambda, into J as I've done elsewhere). Lofi approximates J' J by W W' where W is PxL with small L. Their ELLA starts with the GP formulation, which uses J J', and approximates it as Phi' Phi where Phi is KxNC with small K.

Their rank reduction is also based on SVD, applied to the kernel matrix of a random sample of the training data (of size M, with K ≤ M < NC). Because SVD of J J' and J' J are equivalent (i.e., retain the same information), their method should be close (possibly identical, aside from our SSM assumptions) to a hypothetical version of Lofi that was based on M random past observations. That is, imagine we could compute the exact posterior precision based on those M observations and then compute rank-K SVD. This analysis suggests we should outperform them, since, instead of a random subset of the data, we use all the data, updated online and filtered through our SSM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants