Add support for higher-order tensors, with more than 2 symbolic dimensions #66

bionicles · 2020-05-17T16:29:58Z

to predict side effects, train protein folders, do molecular docking, etc etc, we seek a way to quickly compute a distance metric between arbitrary molecules (think point clouds with features at each point), we'd like to implement the Fused Gromov-Wasserstein shape/feature-distance with KeOps, that requires higher-dimensional indexing because the goal of that is to compare distance matrices, would it be possible to add Vij for distance matrices? This would allow us to use your work

Maybe an easy way to do this is to meta-program a function which returns a function to index an arbitrary number of dimensions deep (f(1) = Vi, f(2) = Vij, f(3) = Vijk, f(4) = Vijkl) that might save work in the future since folks will almost surely be interested in N-dimensional kernels and the internal methods for accessing these variables could work for any number of dimensions anyway

declarative / einstein notation approach would be super cool because we could declare what result we want at each index of an ndarray

A, fA, B, fB = get_molecules()
dA[i, k] = d(A[i], A[k])
dB[j, l] = d(B[j], B[l])
dF[i, j] = d(fA[i], fB[j]) 
Y[i,j,k,l] = dF[i,j] + | d( dA[i,k], dB[j,l]) |

def f(pi):
  Z[i,j,k,l] = Y[i,j,k,l] * pi[i,j] * pi[k,l]
  return sum(Z)

result = optimize(f, pi_0)

unclear if this is possible in python tho. Have you looked at Julia? Some of that language is beautiful for our shared purposes in comparison to Python

The text was updated successfully, but these errors were encountered:

jeanfeydy · 2020-07-14T17:15:59Z

Hi @bionicles ,

Thanks a lot for your interesting question!

First of all, all my apologies for this late reply: we have been extremely busy over the last two months and are just starting to catch up on late issues. Don't worry though - both KeOps and GeomLoss are still under very active development. Since you are working with optimal transport theory, you may be interested in my PhD thesis that I defended two weeks ago: the slides are available here, and I will try to quickly upload a video on my website.

As for your question: this is very much do-able in Python, using broadcasting syntax.
Let's say that the input arrays are declared as follows:

import time
import torch
from pykeops.torch import LazyTensor
torch.set_default_tensor_type(torch.cuda.FloatTensor)

N, M = 500, 1000  # Number of points (= atoms?) per molecule
D, E =   3,    3  # Dimension of the ambient space
F    =  10        # Number of features per points

# Cost function - here, a squared Euclidean norm:
d = lambda a, b : ((a - b) ** 2).sum(-1)

# Toy input data:
x, f = torch.randn(N, D), torch.randn(N, F)  # = A, fA
y, g = torch.randn(M, E), torch.randn(M, F)  # = B, fB
P    = torch.rand(N, M)  # Transport plan "pi" between A and B

Then, with PyTorch, you can compute a fused Gromov-Wasserstein cost with:

P_ij = P.view(N, 1, M, 1)
P_kl = P.view(1, N, 1, M)

x_i  = x.view(N, 1, 1, 1, D)
x_k  = x.view(1, N, 1, 1, D)

y_j  = y.view(1, 1, M, 1, E)
y_l  = y.view(1, 1, 1, M, E)

f_i  = f.view(N, 1, 1, 1, F)
g_j  = g.view(1, 1, M, 1, F)


# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl  # (N, N, M, M) Tensor

# Actual computation:
C_ijk = C_ijkl.sum()  # (N, N, M, M) Tensor -> 1

The main problem with this code is its O(N^2 * M^2) memory footprint.
With KeOps, you can simply achieve a O(N^2 * M) memory usage by wrapping the input variables as symbolic LazyTensors:

# Turn our Tensors into KeOps symbolic variables:
P_ij = LazyTensor( P.view(N, 1, M, 1, 1) )
P_kl = LazyTensor( P.view(1, N, 1, M, 1) )

x_i  = LazyTensor( x.view(N, 1, 1, 1, D) )
x_k  = LazyTensor( x.view(1, N, 1, 1, D) )

y_j  = LazyTensor( y.view(1, 1, M, 1, E) )
y_l  = LazyTensor( y.view(1, 1, 1, M, E) )

f_i  = LazyTensor( f.view(N, 1, 1, 1, F) )
g_j  = LazyTensor( g.view(1, 1, M, 1, F) )


# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl  # (N, N, M, M) LazyTensor

# Actual computation:
C_ijk = C_ijkl.sum(3)  # (N, N, M, M) LazyTensor -> (N, N, M) Tensor
C_sum = C_ijk.sum()

This code should be a little bit faster than the vanilla PyTorch one, and much more scalable.
Going further, you can reach a O(N * M) memory footprint with:

# Turn our Tensors into KeOps symbolic variables:
P_ij = LazyTensor( P.view(N, M,   1, 1) )
P_kl = LazyTensor( P.view(1, 1, N*M, 1) )

x_i  = LazyTensor( x.view(N, 1, 1, D) )
x_k  = LazyTensor( x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D) )

y_j  = LazyTensor( y.view(1, M, 1, E) )
y_l  = LazyTensor( y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D) )

f_i  = LazyTensor( f.view(N, 1, 1, F) )
g_j  = LazyTensor( g.view(1, M, 1, F) )


# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl  # (N, M, N*M) LazyTensor

# Actual computation:
C_ijk = C_ijkl.sum(2)  # (N, M, N*M) LazyTensor -> (N, M) Tensor
C_sum = C_ijk.sum()

Of course, the use of .repeat(...) here is sub-optimal. Going forward, we'd like to be able to provide simple support for more than two "lazy" axes, using a syntax such as:

P_ij = LazyTensor( P.view(N, 1, M, 1, 1), dim=(0,1,2,3) )
P_kl = LazyTensor( P.view(1, N, 1, M, 1), dim=(0,1,2,3) )

This would allow us to prune out un-necessary copies and reach optimal performances with a minimal memory footprint. In your case, if the transport plan P is itself a symbolic LazyTensor (instead of a full (N,M) matrix) as with e.g. the Sinkhorn algorithm where P is encoded using two dual vectors of size N and M, this could allow you to reach an optimal O(N + M) memory footprint for truly scalable Gromov-Wasserstein computations.

This is feature could be implemented using simple pointer arithmetic in the reduction scheme, just as we did to support batch dimensions. Including the testing, LazyTensor integration and documentation, this should take me around one week of work.

I don't think that this is a very pressing issue (as the work-around above still achieves decent performance), but this is certainly something that we will address long term. Beyond the Gromov-Wasserstein problem which is by itself a strong motivation already, this feature could be of interest to e.g. the developers of the GeomStats library: I will discuss it soon with @ninamiolane, @nguigs and @xpennec .

I hope that this answers your question: what do you think?

Best regards,
Jean

bionicles · 2020-07-17T17:29:15Z

congrats on the thesis defense Jean! let me take some time to read this

bionicles · 2020-07-17T17:33:11Z

Yes it looks good, would be a handy api

matthieuheitz · 2021-02-01T18:32:09Z

That is a great help, thank you very much!
I would also be very grateful if you manage to implement this feature :)
I'd like to scale up the Gromov-Wasserstein problem to N = M = ~ 10^7-10^8, so obviously, a O(N+M) complexity would be an enormous advantage.
Let me know if I could be of any help, although I probably need to learn a lot more about KeOps and its internal machinery first.
Thanks!

bionicles · 2021-03-10T18:07:11Z

I'd also use multiple lazy axes, especially if it worked with a Jax backend, although that's perhaps a big ask. @matthieuheitz you can implement this algorithm in Julia with Tullio.jl pretty easily and it can write a GPU kernel for you

othmanesebbouh · 2022-12-22T18:18:46Z

Hi, I am trying to reproduce the code snippet given for the O(N * M) version of @jeanfeydy's solution, and I'm encountering a strange behavior.

Specifically, when defining

import torch
from pykeops.torch import LazyTensor

N, M = 4, 5
D, E =   3,    3

# Cost function - here, a squared Euclidean norm:
d = lambda a, b : ((a - b) ** 2).sum(-1)

# Toy input data:

x = torch.Tensor([[.1, .5, .3],
                  [.3, .4, .2],
                  [.2, .2, .3],
                  [.4, .5, .7]]).cuda()

y = torch.Tensor([[-.1, .5, -.3],
                  [.1, .2, .2],
                  [.3, -.2, .3],
                  [.2, -.5, .9],
                  [.2, -.5, .9]]).cuda()

x_i  = LazyTensor( x.view(N, 1, 1, D) )
x_k  = LazyTensor( x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D) )

y_j  = LazyTensor( y.view(1, M, 1, E) )
y_l  = LazyTensor( y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D) )

# PyTorch equivalents
x_i_t  = torch.clone(x.view(N, 1, 1, D))
x_k_t  = torch.clone(x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D))

y_j_t  = torch.clone(y.view(1, M, 1, E))
y_l_t  = torch.clone(y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D))

Then running

print(dist(x_i, x_k).sum(dim=1).squeeze()) # 1st time this quantity is computed
print(dist(x_i_t, x_k_t).sum(dim=1).squeeze()) # computing the corresponding pytorch quantity
print(dist(x_i, x_k).sum(dim=1).squeeze()) # 2nd time it's computed

returns

tensor([ 2.0500,  0.5000, -0.3000,  0.1000], device='cuda:0')
tensor([2.0500, 1.9500, 2.2500, 4.0500], device='cuda:0')
tensor([2.0500, 1.9500, 2.2500, 4.0500], device='cuda:0')

So I even get a negative value for a sum of squared euclidean distances, and recomputing the same quantity after computing the pytorch version returns a different value. I'm probably missing something obvious, but I can't figure out why. Have any of you encountered a similar problem? (Happy to open a different issue if this isn't the place to ask this question)

Note that for the corresponding computations with the y variables, or when reshaping the x variables as done for the y's, everything works fine.

oliver-batchelor mentioned this issue Jun 7, 2020

Shifted indices for LazyTensors #69

Open

jeanfeydy self-assigned this Jul 14, 2020

jeanfeydy changed the title ~~Please add Vij indexing~~ Add support for higher-order tensors, with more than 2 symbolic dimensionss Jul 14, 2020

jeanfeydy changed the title ~~Add support for higher-order tensors, with more than 2 symbolic dimensionss~~ Add support for higher-order tensors, with more than 2 symbolic dimensions Jul 14, 2020

jeanfeydy added the enhancement New feature or request label Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for higher-order tensors, with more than 2 symbolic dimensions #66

Add support for higher-order tensors, with more than 2 symbolic dimensions #66

bionicles commented May 17, 2020 •

edited

Loading

jeanfeydy commented Jul 14, 2020 •

edited

Loading

bionicles commented Jul 17, 2020

bionicles commented Jul 17, 2020

matthieuheitz commented Feb 1, 2021

bionicles commented Mar 10, 2021

othmanesebbouh commented Dec 22, 2022

Add support for higher-order tensors, with more than 2 symbolic dimensions #66

Add support for higher-order tensors, with more than 2 symbolic dimensions #66

Comments

bionicles commented May 17, 2020 • edited Loading

jeanfeydy commented Jul 14, 2020 • edited Loading

bionicles commented Jul 17, 2020

bionicles commented Jul 17, 2020

matthieuheitz commented Feb 1, 2021

bionicles commented Mar 10, 2021

othmanesebbouh commented Dec 22, 2022

bionicles commented May 17, 2020 •

edited

Loading

jeanfeydy commented Jul 14, 2020 •

edited

Loading