Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for higher-order tensors, with more than 2 symbolic dimensions #66

Open
bionicles opened this issue May 17, 2020 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@bionicles
Copy link

bionicles commented May 17, 2020

to predict side effects, train protein folders, do molecular docking, etc etc, we seek a way to quickly compute a distance metric between arbitrary molecules (think point clouds with features at each point), we'd like to implement the Fused Gromov-Wasserstein shape/feature-distance with KeOps, that requires higher-dimensional indexing because the goal of that is to compare distance matrices, would it be possible to add Vij for distance matrices? This would allow us to use your work

Maybe an easy way to do this is to meta-program a function which returns a function to index an arbitrary number of dimensions deep (f(1) = Vi, f(2) = Vij, f(3) = Vijk, f(4) = Vijkl) that might save work in the future since folks will almost surely be interested in N-dimensional kernels and the internal methods for accessing these variables could work for any number of dimensions anyway

declarative / einstein notation approach would be super cool because we could declare what result we want at each index of an ndarray

A, fA, B, fB = get_molecules()
dA[i, k] = d(A[i], A[k])
dB[j, l] = d(B[j], B[l])
dF[i, j] = d(fA[i], fB[j]) 
Y[i,j,k,l] = dF[i,j] + | d( dA[i,k], dB[j,l]) |

def f(pi):
  Z[i,j,k,l] = Y[i,j,k,l] * pi[i,j] * pi[k,l]
  return sum(Z)

result = optimize(f, pi_0)

unclear if this is possible in python tho. Have you looked at Julia? Some of that language is beautiful for our shared purposes in comparison to Python

@jeanfeydy jeanfeydy self-assigned this Jul 14, 2020
@jeanfeydy jeanfeydy changed the title Please add Vij indexing Add support for higher-order tensors, with more than 2 symbolic dimensionss Jul 14, 2020
@jeanfeydy jeanfeydy changed the title Add support for higher-order tensors, with more than 2 symbolic dimensionss Add support for higher-order tensors, with more than 2 symbolic dimensions Jul 14, 2020
@jeanfeydy
Copy link
Contributor

jeanfeydy commented Jul 14, 2020

Hi @bionicles ,

Thanks a lot for your interesting question!

First of all, all my apologies for this late reply: we have been extremely busy over the last two months and are just starting to catch up on late issues. Don't worry though - both KeOps and GeomLoss are still under very active development. Since you are working with optimal transport theory, you may be interested in my PhD thesis that I defended two weeks ago: the slides are available here, and I will try to quickly upload a video on my website.

As for your question: this is very much do-able in Python, using broadcasting syntax.
Let's say that the input arrays are declared as follows:

import time
import torch
from pykeops.torch import LazyTensor
torch.set_default_tensor_type(torch.cuda.FloatTensor)

N, M = 500, 1000  # Number of points (= atoms?) per molecule
D, E =   3,    3  # Dimension of the ambient space
F    =  10        # Number of features per points

# Cost function - here, a squared Euclidean norm:
d = lambda a, b : ((a - b) ** 2).sum(-1)

# Toy input data:
x, f = torch.randn(N, D), torch.randn(N, F)  # = A, fA
y, g = torch.randn(M, E), torch.randn(M, F)  # = B, fB
P    = torch.rand(N, M)  # Transport plan "pi" between A and B

Then, with PyTorch, you can compute a fused Gromov-Wasserstein cost with:

P_ij = P.view(N, 1, M, 1)
P_kl = P.view(1, N, 1, M)

x_i  = x.view(N, 1, 1, 1, D)
x_k  = x.view(1, N, 1, 1, D)

y_j  = y.view(1, 1, M, 1, E)
y_l  = y.view(1, 1, 1, M, E)

f_i  = f.view(N, 1, 1, 1, F)
g_j  = g.view(1, 1, M, 1, F)


# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl  # (N, N, M, M) Tensor

# Actual computation:
C_ijk = C_ijkl.sum()  # (N, N, M, M) Tensor -> 1

The main problem with this code is its O(N^2 * M^2) memory footprint.
With KeOps, you can simply achieve a O(N^2 * M) memory usage by wrapping the input variables as symbolic LazyTensors:

# Turn our Tensors into KeOps symbolic variables:
P_ij = LazyTensor( P.view(N, 1, M, 1, 1) )
P_kl = LazyTensor( P.view(1, N, 1, M, 1) )

x_i  = LazyTensor( x.view(N, 1, 1, 1, D) )
x_k  = LazyTensor( x.view(1, N, 1, 1, D) )

y_j  = LazyTensor( y.view(1, 1, M, 1, E) )
y_l  = LazyTensor( y.view(1, 1, 1, M, E) )

f_i  = LazyTensor( f.view(N, 1, 1, 1, F) )
g_j  = LazyTensor( g.view(1, 1, M, 1, F) )


# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl  # (N, N, M, M) LazyTensor

# Actual computation:
C_ijk = C_ijkl.sum(3)  # (N, N, M, M) LazyTensor -> (N, N, M) Tensor
C_sum = C_ijk.sum()

This code should be a little bit faster than the vanilla PyTorch one, and much more scalable.
Going further, you can reach a O(N * M) memory footprint with:

# Turn our Tensors into KeOps symbolic variables:
P_ij = LazyTensor( P.view(N, M,   1, 1) )
P_kl = LazyTensor( P.view(1, 1, N*M, 1) )

x_i  = LazyTensor( x.view(N, 1, 1, D) )
x_k  = LazyTensor( x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D) )

y_j  = LazyTensor( y.view(1, M, 1, E) )
y_l  = LazyTensor( y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D) )

f_i  = LazyTensor( f.view(N, 1, 1, F) )
g_j  = LazyTensor( g.view(1, M, 1, F) )


# Symbolic cost associated to the Gromov-Wasserstein-like metric:
Y_ijkl = d(f_i, g_j) + (d(x_i, x_k) - d(y_j, y_l)) ** 2
C_ijkl = Y_ijkl * P_ij * P_kl  # (N, M, N*M) LazyTensor

# Actual computation:
C_ijk = C_ijkl.sum(2)  # (N, M, N*M) LazyTensor -> (N, M) Tensor
C_sum = C_ijk.sum()

Of course, the use of .repeat(...) here is sub-optimal. Going forward, we'd like to be able to provide simple support for more than two "lazy" axes, using a syntax such as:

P_ij = LazyTensor( P.view(N, 1, M, 1, 1), dim=(0,1,2,3) )
P_kl = LazyTensor( P.view(1, N, 1, M, 1), dim=(0,1,2,3) )

This would allow us to prune out un-necessary copies and reach optimal performances with a minimal memory footprint. In your case, if the transport plan P is itself a symbolic LazyTensor (instead of a full (N,M) matrix) as with e.g. the Sinkhorn algorithm where P is encoded using two dual vectors of size N and M, this could allow you to reach an optimal O(N + M) memory footprint for truly scalable Gromov-Wasserstein computations.

This is feature could be implemented using simple pointer arithmetic in the reduction scheme, just as we did to support batch dimensions. Including the testing, LazyTensor integration and documentation, this should take me around one week of work.

I don't think that this is a very pressing issue (as the work-around above still achieves decent performance), but this is certainly something that we will address long term. Beyond the Gromov-Wasserstein problem which is by itself a strong motivation already, this feature could be of interest to e.g. the developers of the GeomStats library: I will discuss it soon with @ninamiolane, @nguigs and @xpennec .

I hope that this answers your question: what do you think?

Best regards,
Jean

@jeanfeydy jeanfeydy added the enhancement New feature or request label Jul 15, 2020
@bionicles
Copy link
Author

congrats on the thesis defense Jean! let me take some time to read this

@bionicles
Copy link
Author

Yes it looks good, would be a handy api

@matthieuheitz
Copy link

That is a great help, thank you very much!
I would also be very grateful if you manage to implement this feature :)
I'd like to scale up the Gromov-Wasserstein problem to N = M = ~ 10^7-10^8, so obviously, a O(N+M) complexity would be an enormous advantage.
Let me know if I could be of any help, although I probably need to learn a lot more about KeOps and its internal machinery first.
Thanks!

@bionicles
Copy link
Author

I'd also use multiple lazy axes, especially if it worked with a Jax backend, although that's perhaps a big ask. @matthieuheitz you can implement this algorithm in Julia with Tullio.jl pretty easily and it can write a GPU kernel for you

@othmanesebbouh
Copy link

Hi, I am trying to reproduce the code snippet given for the O(N * M) version of @jeanfeydy's solution, and I'm encountering a strange behavior.

Specifically, when defining

import torch
from pykeops.torch import LazyTensor

N, M = 4, 5
D, E =   3,    3

# Cost function - here, a squared Euclidean norm:
d = lambda a, b : ((a - b) ** 2).sum(-1)

# Toy input data:

x = torch.Tensor([[.1, .5, .3],
                  [.3, .4, .2],
                  [.2, .2, .3],
                  [.4, .5, .7]]).cuda()

y = torch.Tensor([[-.1, .5, -.3],
                  [.1, .2, .2],
                  [.3, -.2, .3],
                  [.2, -.5, .9],
                  [.2, -.5, .9]]).cuda()

x_i  = LazyTensor( x.view(N, 1, 1, D) )
x_k  = LazyTensor( x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D) )

y_j  = LazyTensor( y.view(1, M, 1, E) )
y_l  = LazyTensor( y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D) )

# PyTorch equivalents
x_i_t  = torch.clone(x.view(N, 1, 1, D))
x_k_t  = torch.clone(x.view(N, 1, D).repeat(1, M, 1).view(1, 1, N*M, D))

y_j_t  = torch.clone(y.view(1, M, 1, E))
y_l_t  = torch.clone(y.view(1, M, E).repeat(N, 1, 1).view(1, 1, N*M, D))

Then running

print(dist(x_i, x_k).sum(dim=1).squeeze()) # 1st time this quantity is computed
print(dist(x_i_t, x_k_t).sum(dim=1).squeeze()) # computing the corresponding pytorch quantity
print(dist(x_i, x_k).sum(dim=1).squeeze()) # 2nd time it's computed 

returns

tensor([ 2.0500,  0.5000, -0.3000,  0.1000], device='cuda:0')
tensor([2.0500, 1.9500, 2.2500, 4.0500], device='cuda:0')
tensor([2.0500, 1.9500, 2.2500, 4.0500], device='cuda:0')

So I even get a negative value for a sum of squared euclidean distances, and recomputing the same quantity after computing the pytorch version returns a different value. I'm probably missing something obvious, but I can't figure out why. Have any of you encountered a similar problem? (Happy to open a different issue if this isn't the place to ask this question)

Note that for the corresponding computations with the y variables, or when reshaping the x variables as done for the y's, everything works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants