You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That is, they do the same thing for FCEKF. Additional elements of the Lofi version:
(1) W acts as pseudo-observations, compressing the observations in J.
(2) Upsilon rescales the kernel, modifying the base NTK.
They use the GP formulation to motivate a low-rank approximation that is complementary to ours, compressing along the model parameters (P) rather than along the training set (NC). In other words, the true posterior precision equals J' J where J is the NCxP matrix that concatenates Jacobians from all training examples (for simplicity I'm absorbing outcome covariance, our R or their Lambda, into J as I've done elsewhere). Lofi approximates J' J by W W' where W is PxL with small L. Their ELLA starts with the GP formulation, which uses J J', and approximates it as Phi' Phi where Phi is KxNC with small K.
Their rank reduction is also based on SVD, applied to the kernel matrix of a random sample of the training data (of size M, with K ≤ M < NC). Because SVD of J J' and J' J are equivalent (i.e., retain the same information), their method should be close (possibly identical, aside from our SSM assumptions) to a hypothetical version of Lofi that was based on M random past observations. That is, imagine we could compute the exact posterior precision based on those M observations and then compute rank-K SVD. This analysis suggests we should outperform them, since, instead of a random subset of the data, we use all the data, updated online and filtered through our SSM.
The text was updated successfully, but these errors were encountered:
Z. Deng, F. Zhou, and J. Zhu, “Accelerated Linearized Laplace Approximation for Bayesian Deep Learning,” in Advances in Neural Information Processing Systems, Oct. 2022 [Online]. Available: https://openreview.net/forum?id=jftNpltMgz. [Accessed: Apr. 29, 2023]
https://github.com/thudzj/ELLA
That is, they do the same thing for FCEKF. Additional elements of the Lofi version:
(1) W acts as pseudo-observations, compressing the observations in J.
(2) Upsilon rescales the kernel, modifying the base NTK.
They use the GP formulation to motivate a low-rank approximation that is complementary to ours, compressing along the model parameters (P) rather than along the training set (NC). In other words, the true posterior precision equals J' J where J is the NCxP matrix that concatenates Jacobians from all training examples (for simplicity I'm absorbing outcome covariance, our R or their Lambda, into J as I've done elsewhere). Lofi approximates J' J by W W' where W is PxL with small L. Their ELLA starts with the GP formulation, which uses J J', and approximates it as Phi' Phi where Phi is KxNC with small K.
Their rank reduction is also based on SVD, applied to the kernel matrix of a random sample of the training data (of size M, with K ≤ M < NC). Because SVD of J J' and J' J are equivalent (i.e., retain the same information), their method should be close (possibly identical, aside from our SSM assumptions) to a hypothetical version of Lofi that was based on M random past observations. That is, imagine we could compute the exact posterior precision based on those M observations and then compute rank-K SVD. This analysis suggests we should outperform them, since, instead of a random subset of the data, we use all the data, updated online and filtered through our SSM.
The text was updated successfully, but these errors were encountered: