-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for higher-order tensors, with more than 2 symbolic dimensions #66
Comments
Hi @bionicles , Thanks a lot for your interesting question! First of all, all my apologies for this late reply: we have been extremely busy over the last two months and are just starting to catch up on late issues. Don't worry though - both KeOps and GeomLoss are still under very active development. Since you are working with optimal transport theory, you may be interested in my PhD thesis that I defended two weeks ago: the slides are available here, and I will try to quickly upload a video on my website. As for your question: this is very much do-able in Python, using broadcasting syntax. import time
import torch
from pykeops.torch import LazyTensor
torch.set_default_tensor_type(torch.cuda.FloatTensor)
N, M = 500, 1000 # Number of points (= atoms?) per molecule
D, E = 3, 3 # Dimension of the ambient space
F = 10 # Number of features per points
# Cost function - here, a squared Euclidean norm:
d = lambda a, b : ((a - b) ** 2).sum(-1)
# Toy input data:
x, f = torch.randn(N, D), torch.randn(N, F) # = A, fA
y, g = torch.randn(M, E), torch.randn(M, F) # = B, fB
P = torch.rand(N, M) # Transport plan "pi" between A and B Then, with PyTorch, you can compute a fused Gromov-Wasserstein cost with: P_ij = P.view(N, 1, M, 1)
P_kl = P.view(1, N, 1, M)
x_i = x.view(N, 1, 1, 1, D)
x_k = x.view(1, N, 1, 1, D)
y_j = y.view(1, 1, M, 1, E)
y_l = y.view(1, 1, 1, M, E)
f_i = f.view(N, 1, 1, 1, F)
g_j = g.view(1, 1, M, 1, F)
# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl # (N, N, M, M) Tensor
# Actual computation:
C_ijk = C_ijkl.sum() # (N, N, M, M) Tensor -> 1 The main problem with this code is its # Turn our Tensors into KeOps symbolic variables:
P_ij = LazyTensor( P.view(N, 1, M, 1, 1) )
P_kl = LazyTensor( P.view(1, N, 1, M, 1) )
x_i = LazyTensor( x.view(N, 1, 1, 1, D) )
x_k = LazyTensor( x.view(1, N, 1, 1, D) )
y_j = LazyTensor( y.view(1, 1, M, 1, E) )
y_l = LazyTensor( y.view(1, 1, 1, M, E) )
f_i = LazyTensor( f.view(N, 1, 1, 1, F) )
g_j = LazyTensor( g.view(1, 1, M, 1, F) )
# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl # (N, N, M, M) LazyTensor
# Actual computation:
C_ijk = C_ijkl.sum(3) # (N, N, M, M) LazyTensor -> (N, N, M) Tensor
C_sum = C_ijk.sum() This code should be a little bit faster than the vanilla PyTorch one, and much more scalable. # Turn our Tensors into KeOps symbolic variables:
P_ij = LazyTensor( P.view(N, M, 1, 1) )
P_kl = LazyTensor( P.view(1, 1, N*M, 1) )
x_i = LazyTensor( x.view(N, 1, 1, D) )
x_k = LazyTensor( x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D) )
y_j = LazyTensor( y.view(1, M, 1, E) )
y_l = LazyTensor( y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D) )
f_i = LazyTensor( f.view(N, 1, 1, F) )
g_j = LazyTensor( g.view(1, M, 1, F) )
# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl # (N, M, N*M) LazyTensor
# Actual computation:
C_ijk = C_ijkl.sum(2) # (N, M, N*M) LazyTensor -> (N, M) Tensor
C_sum = C_ijk.sum() Of course, the use of P_ij = LazyTensor( P.view(N, 1, M, 1, 1), dim=(0,1,2,3) )
P_kl = LazyTensor( P.view(1, N, 1, M, 1), dim=(0,1,2,3) ) This would allow us to prune out un-necessary copies and reach optimal performances with a minimal memory footprint. In your case, if the transport plan This is feature could be implemented using simple pointer arithmetic in the reduction scheme, just as we did to support batch dimensions. Including the testing, LazyTensor integration and documentation, this should take me around one week of work. I don't think that this is a very pressing issue (as the work-around above still achieves decent performance), but this is certainly something that we will address long term. Beyond the Gromov-Wasserstein problem which is by itself a strong motivation already, this feature could be of interest to e.g. the developers of the GeomStats library: I will discuss it soon with @ninamiolane, @nguigs and @xpennec . I hope that this answers your question: what do you think? Best regards, |
congrats on the thesis defense Jean! let me take some time to read this |
Yes it looks good, would be a handy api |
That is a great help, thank you very much! |
I'd also use multiple lazy axes, especially if it worked with a Jax backend, although that's perhaps a big ask. @matthieuheitz you can implement this algorithm in Julia with Tullio.jl pretty easily and it can write a GPU kernel for you |
Hi, I am trying to reproduce the code snippet given for the Specifically, when defining import torch
from pykeops.torch import LazyTensor
N, M = 4, 5
D, E = 3, 3
# Cost function - here, a squared Euclidean norm:
d = lambda a, b : ((a - b) ** 2).sum(-1)
# Toy input data:
x = torch.Tensor([[.1, .5, .3],
[.3, .4, .2],
[.2, .2, .3],
[.4, .5, .7]]).cuda()
y = torch.Tensor([[-.1, .5, -.3],
[.1, .2, .2],
[.3, -.2, .3],
[.2, -.5, .9],
[.2, -.5, .9]]).cuda()
x_i = LazyTensor( x.view(N, 1, 1, D) )
x_k = LazyTensor( x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D) )
y_j = LazyTensor( y.view(1, M, 1, E) )
y_l = LazyTensor( y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D) )
# PyTorch equivalents
x_i_t = torch.clone(x.view(N, 1, 1, D))
x_k_t = torch.clone(x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D))
y_j_t = torch.clone(y.view(1, M, 1, E))
y_l_t = torch.clone(y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D)) Then running print(dist(x_i, x_k).sum(dim=1).squeeze()) # 1st time this quantity is computed
print(dist(x_i_t, x_k_t).sum(dim=1).squeeze()) # computing the corresponding pytorch quantity
print(dist(x_i, x_k).sum(dim=1).squeeze()) # 2nd time it's computed returns
So I even get a negative value for a sum of squared euclidean distances, and recomputing the same quantity after computing the pytorch version returns a different value. I'm probably missing something obvious, but I can't figure out why. Have any of you encountered a similar problem? (Happy to open a different issue if this isn't the place to ask this question) Note that for the corresponding computations with the y variables, or when reshaping the x variables as done for the y's, everything works fine. |
to predict side effects, train protein folders, do molecular docking, etc etc, we seek a way to quickly compute a distance metric between arbitrary molecules (think point clouds with features at each point), we'd like to implement the Fused Gromov-Wasserstein shape/feature-distance with KeOps, that requires higher-dimensional indexing because the goal of that is to compare distance matrices, would it be possible to add Vij for distance matrices? This would allow us to use your work
Maybe an easy way to do this is to meta-program a function which returns a function to index an arbitrary number of dimensions deep (f(1) = Vi, f(2) = Vij, f(3) = Vijk, f(4) = Vijkl) that might save work in the future since folks will almost surely be interested in N-dimensional kernels and the internal methods for accessing these variables could work for any number of dimensions anyway
declarative / einstein notation approach would be super cool because we could declare what result we want at each index of an ndarray
unclear if this is possible in python tho. Have you looked at Julia? Some of that language is beautiful for our shared purposes in comparison to Python
The text was updated successfully, but these errors were encountered: