Extracting a band diagonal with KeOps? #24

ibeltagy · 2019-09-26T18:29:28Z

I want to do matrix multiplication of 2D tensors where I only care about a few diagonals of the resulting matrix, and I want to run this on GPU on PyTorch. Is this something that can be done with keops ?

For illustration, here's the numpy code, but what I really need is PyTorch/GPU

import numpy as np
M = 16000  # huge, can't do the O(n^2) operations
N = 64
c = 100  # number of diagonals, a lot smaller than M
t1 = np.random.rand(M, N)  
t2 = np.random.rand(M, N) 
r = np.zeros((M, d))

for i in range(M):
    for j in range(-c, c):  
        r[i][j] = np.dot(t1[i], t2[i + j])  # `(i + j)` should be `min(max(i + j, 0), M - 1)`
                                            # to take care of boundry condition, but let's skip that for now

PS: I am running into compilation errors with the tutorial examples, but will ask about that later.

The text was updated successfully, but these errors were encountered:

ibeltagy · 2019-09-26T19:32:45Z

@bcharlier, @jeanfeydy,
Fixed the compilation errors, and I can run the tutorial examples fine.

Reading more about the syntax here https://www.kernel-operations.io/keops/api/math-operations.html, looks like the operation I need doesn't fit the format you expect. Is there a way around that?

Also, does keops support fp16?

Thanks

jeanfeydy · 2019-09-26T21:34:57Z

Hi @ibeltagy ,

Happy to see that you could install everything!
If I understand your problem correctly, you are trying to compute efficiently a diagonal band (of width 2*c+1) of the Gram matrix t1 @ t2.T of your dataset. You could probably write this using a very ugly KeOps formula, with auxiliary integer variables "i" and "j". But as far as I can tell, a vanilla PyTorch code along the lines of:

for j in range(-c, c):
    r[:, j] = (t1 * t2[ j : M+j, :]).sum(dim=1)

would be a much simpler option, and reasonably efficient. Of course, you would need zero-pad t2 to take into account the boundary conditions. What do you think?

As for your second question: today, KeOps only supports fp32 ("float") and fp64 ("double") precisions. Adding support for other float numbers shouldn't be too hard, but we simply never thought about it. What do you think, @bcharlier ?

Best regards,

Jean

ibeltagy · 2019-09-26T21:40:47Z

This PyTorch solution is easy but it is super slow because of large number of slicing operations (2c+1 of them, and c is around 100), and it doesn't seem that pytoch is parallelizing them well.

ibeltagy · 2019-09-26T21:45:24Z

Another way of doing the same thing is using as_strided followed by tensor dot product. Something like this:

t1 = torch.randn(15901, 1, 64, device=cuda_device, requires_grad=True) 
t2 = torch.randn(16000, 64, device=cuda_device, requires_grad=True)
strides = t2.stride()
t2_strided = t2.as_strided(size=(15901, 100, 64), stride=(strides[0], strides[0], strides[1]))
t1.expand_as(t2_strided)

diagonals = (t1 * t2_strided).sum(dim=-1)  # memory intensive because PyTorch unnecessarily calls `t2_strided.contiguous()`

As far as I can tell, keops would be a good solution to avoid this excessive memory allocation, but do you know if it would be with a matrix that is the output of as_strided with some weird strides to get these sliding windows?

ibeltagy · 2019-09-26T22:06:19Z

You could probably write this using a very ugly KeOps formula, with auxiliary integer variables "i" and "j".

But would it be fast ?

Also about this,

for j in range(-c, c):
    r[:, j] = (t1 * t2[ j : M+j, :]).sum(dim=1)

I saw one example in your tutorials that has loops. Can this solution be implemented in keops? I guess it will result in a very long formula (with lots of sub-formulas, one for each value of j), but would it work?

jeanfeydy · 2019-09-27T09:14:19Z

Hi @ibeltagy ,

I've been thinking about it during the night: thanks for the interesting problem!
I'm pretty sure that we'll be able to answer your question efficiently: I just need to implement a new one_hot(...) operator for KeOps Genred + LazyTensors. Then, a code along the lines of:

import torch
from pykeops.torch import LazyTensor

M, D, c = 100000, 64, 100
t1 = torch.randn(M, D).cuda()
t2 = torch.randn(M, D).cuda()
indices = torch.arange(M).float().cuda()

i = LazyTensor( indices[:,None,None] )
j = LazyTensor( indices[None,:,None] )
x = LazyTensor( t1[:,None,:] )
y = LazyTensor( t2[None,:,:] )

K_ij = (j - i + c).one_hot(2*c+1) * (x | y)
K_ij.ranges = blablabla  # block-sparsity pattern around the band diagonal

r = K_ij.sum(dim=1)

will probably be good enough, with streamlined computations and relatively few wasted operations.
I'll work on it this morning, and keep you updated!

Best regards,

Jean

ibeltagy · 2019-09-27T14:50:37Z

@jeanfeydy, that's very cool. Thanks for helping.

Let me add two more requirements to make sure that your solution is going to address my use case.

1- A way to handle the boundary conditions. For example, tvm has a tvm.if_then_else (see code here https://docs.tvm.ai/tutorials/optimize/opt_conv_cuda.html) to compute the value or return a constant.

2- The real use case is that the diagonals I need are not necessarily a contagious band around the main diagonal. Would it be possible to provide a list of diagonal indices that I care about? the list is fixed and known during compilation time. This is probably your K_ij.ranges = blablabla but wanted to double check

Thanks again for doing this.

jeanfeydy · 2019-09-27T16:55:24Z

Hi @ibeltagy ,

I think that 26b4829 does the trick! I'm pretty busy right now and could not fully test everything, but I think that the code below answers your question:

import torch
from pykeops.torch import LazyTensor
use_cuda = torch.cuda.is_available()
load = lambda x : x.cuda() if use_cuda else x

M, D, c = 100000, 64, 100
t1 = load(torch.randn(M, D))
t2 = load(torch.randn(M, D))
indices = load(torch.arange(M).float())

i = LazyTensor( indices[:,None,None] )
j = LazyTensor( indices[None,:,None] )
x = LazyTensor( t1[:,None,:] )
y = LazyTensor( t2[None,:,:] )

K_ij = (j - i + c).one_hot(2*c+1) * (x | y)  # "Kernel" matrix

#r = K_ij.sum(dim=1)  # Bruteforce, quadratic computation

if True:  # Use a block-sparsity pattern around the band diagonal
    from math import ceil
    Mc = ceil(M / c)  # Number of blocks of size (c,c)
    I = load(torch.arange(Mc).int())
    # The lines below describe a block-sparse pattern of width
    # 3 around the diagonal, with blocks of size (c,c):

    # 1. "i" indices are grouped in clusters of size c:
    ranges_i = torch.stack( (I * c, (I+1) * c) ).clamp(0, M).t().contiguous() # [[0, c], [c, 2*c], ...]
    # 2. each "i" cluster is associated to one range over "j": 
    slices_i = load(torch.arange(1, Mc + 1).int())  # [1, 2, ..., Mc]
    # 3. the reduction ranges are given by [[0, 2*c], [0, 3*c], [c, 4*c], ...]:
    ranges_j = torch.stack( ((I-1) * c, (I+2) * c) ).clamp(0, M).t().contiguous()

    # N.B.: here, because the matrix is square, the pattern is symmetric for the transpose.
    K_ij.ranges = (ranges_i, slices_i, ranges_j, ranges_i, slices_i, ranges_j)

r = K_ij.sum(dim=1)  # Actual computation

# Test:
I, J = 34, 58
print(r[I, c + J - I])
print(torch.dot( t1[I], t2[J] ))

You can try it on Google Colab with the following imports in your first and second cells:

# Install the dependencies for sphinx and KeOps
!pip install numpy GPUtil cmake ninja > install.log
!pip install sphinx-gallery recommonmark sphinxcontrib-httpdomain sphinx_rtd_theme  >> install.log

# Download KeOps... And don't forget to update pybind11.
!git clone --recursive https://github.com/getkeops/keops.git keops/  >> install.log

# Put KeOps in the current environment
import sys
sys.path.append('/content/keops/')

As far as I can tell, this runs pretty fast. What do you think?
As for your second question, it depends on the diagonals that you are looking for. Are they packed in collections of contiguous blocks (good), or are they completely random (ouch)? GPUs are heavily biased towards contiguous memory accesses, so even if you craft a nice formula to replace my K_ij = (j - i + c).one_hot(2*c+1) * (x | y): performances will mostly depend on whether or not computations can fit in a nice block-sparse reduction scheme.

Best,
Jean

jeanfeydy · 2019-09-27T18:27:02Z

Oooops, just remarked a small bug in my C++ implementation: (int) x + 0.5 != round(x) if x is negative! I fix this quickly, and will edit the message + script above.

ibeltagy · 2019-09-27T18:36:57Z

Very cool. I will try this today and get back to you.

As for the diagonals, they are evenly spaced and the spacing is an argument. The example above is with spacing=0

jeanfeydy · 2019-09-27T19:01:33Z

I see! I have updated the instructions in the post above: everything should be fine now.
As for your general problem: it is definitely doable, but performances will take a hit as spacing increases. I may have to implement a few arithmetic operations to make it work: I'll see this over the week-end :-)

ibeltagy · 2019-09-27T19:13:05Z

I ran the code above and it does work. I will test the gradients and do some profiling for the performance and get back to you.
Thanks a lot for doing this. I really appreciate your help.

ibeltagy · 2019-09-27T21:19:21Z

I can the code vs the naive pytorch solution mentioned earlier, and it seems that keops is 10x slower, when I was expecting it to be 10x faster . The code and time estimates are below. Any suggestions what could be slowing it down? Thanks

from math import ceil
from pykeops.torch import LazyTensor
use_cuda = torch.cuda.is_available()
load = lambda x : x.cuda() if use_cuda else x
import time

# this function is called many times
def compute_mm_diagonals_keops(t1, t2, c, M, D, indices, ranges_i, slices_i, ranges_j):
  i = LazyTensor( indices[:,None,None] )
  j = LazyTensor( indices[None,:,None] )
  x = LazyTensor( t1[:,None,:] )
  y = LazyTensor( t2[None,:,:] )
  K_ij = (j - i + c).one_hot(2 * c + 1) * (x | y)  # "Kernel" matrix

  # N.B.: here, because the matrix is square, the pattern is symmetric for the transpose.
  K_ij.ranges = (ranges_i, slices_i, ranges_j, ranges_i, slices_i, ranges_j)
  r = K_ij.sum(dim=1)  # Actual computation
  return r

def compute_mm_diagonals_pytorch(t1, t2, c):
    seq_len = t1.size()[0]
    diagonals_list = []
    for i, d in enumerate(range(-c, c+1)):
        if d >= 0:
            slice_1 = t1.narrow(dim=0, start=0, length=seq_len - d)
            slice_2 = t2.narrow(dim=0, start=d, length=seq_len - d)
            pad_left = 0
            pad_right = d
        else:
            slice_1 = t1.narrow(dim=0, start=-d, length=seq_len + d)
            slice_2 = t2.narrow(dim=0, start=0, length=seq_len + d)
            pad_left = -d
            pad_right = 0
        diagonals_list.append(torch.nn.functional.pad((slice_1 * slice_2).sum(-1), pad=(pad_left, pad_right), value=0))
    r = torch.cat(diagonals_list, dim=-1).view(c * 2 + 1, seq_len).transpose(dim0=0, dim1=1)
    return r
  
# building the indices and slices happens only once
M, D, c = 16000, 64, 128
indices = load(torch.arange(M).float())
Mc = ceil(M / c)  # Number of blocks of size (c,c)
I = load(torch.arange(Mc).int())
# The lines below describe a block-sparse pattern of width
# 3 around the diagonal, with blocks of size (c,c):

# 1. "i" indices are grouped in clusters of size c:
ranges_i = torch.stack( (I * c, (I+1) * c) ).clamp(0, M).t().contiguous() # [[0, c], [c, 2*c], ...]
# 2. each "i" cluster is associated to one range over "j": 
slices_i = load(torch.arange(1, Mc + 1).int())  # [1, 2, ..., Mc]
# 3. the reduction ranges are given by [[0, 2*c], [0, 3*c], [c, 4*c], ...]:
ranges_j = torch.stack( ((I-1) * c, (I+2) * c) ).clamp(0, M).t().contiguous()

# call it once to compile 
t1 = load(torch.randn(M, D))
t2 = load(torch.randn(M, D))
r = compute_mm_diagonals_keops(t1, t2, c, M, D, indices, ranges_i, slices_i, ranges_j)

# call many times and measure time
time1 = 0
time2 = 0
for i in range(100):
  t1 = load(torch.randn(M, D))
  t2 = load(torch.randn(M, D))
  torch.cuda.synchronize()
  start = time.time()
  r = compute_mm_diagonals_keops(t1, t2, c, M, D, indices, ranges_i, slices_i, ranges_j)
  torch.cuda.synchronize()
  time1 += time.time() - start
  
  start = time.time()
  r2 = compute_mm_diagonals_pytorch(t1, t2, c)
  torch.cuda.synchronize()
  time2 += time.time() - start
  
  if not torch.allclose(r, r2, atol=1e-05):
    print('not equal')
print(f'Time keops: {time1} s')
print(f'Time pytorch: {time2} s')

Time keops: 18.393823623657227 s
Time pytorch: 1.8735544681549072 s

jeanfeydy · 2019-09-30T21:14:35Z

Hi @ibeltagy ,

Indeed! Running some benchmarks (c = 30, c = 100, K_ij = (x | y), etc.), I'm pretty sure that these low performances are due to my heavy-handed one-hot encoding. That was a useful operation to implement anyway, but in this situation, performing a product + summation on a large vector of size 257 at every step is just way too inefficient.
Fortunately, reaching nearly-optimal performances shouldn't be too hard: I'll just have to implement a new scattered_sum reduction. Writing the C++ won't be difficult, but I'd rather think a little bit about a nice LazyTensor user-interface before adding it to master. I'll try to start implementing it tomorrow!

Best regards,
Jean

ibeltagy · 2019-10-01T03:12:42Z

that would be cool. Thanks

jeanfeydy · 2019-10-01T13:58:26Z

Hi @ibeltagy ,
I implemented a prototype in 40217c1 (branch scattered_sum): the new syntax looks like:

def compute_mm_diagonals_keops(t1, t2, c, M, D, indices, ranges_i, slices_i, ranges_j):
    i = LazyTensor( indices[:,None,None] )
    j = LazyTensor( indices[None,:,None] )
    x = LazyTensor( t1[:,None,:] )
    y = LazyTensor( t2[None,:,:] )
    K_ij = (x | y)
    # N.B.: here, because the matrix is square, the pattern is symmetric for the transpose.
    K_ij.ranges = (ranges_i, slices_i, ranges_j, ranges_i, slices_i, ranges_j)
    
    r = K_ij.scattered_sum(j - i + c, 2*c+1, dim=1)  # Actual computation
    return r

Performances vary with M, D and c, but in your use-case (M, D, c = 16000, 64, 128), I get a x6 speed-up vs PyTorch on Google Colab:

Time keops: 0.7824671268463135 s
Time pytorch: 4.656574964523315 s

Now, I "just" have to document all this, implement the gradient and think a bit about the spacing parameter :-)

Best regards,
Jean

jeanfeydy · 2019-10-02T21:48:33Z

Gradient is implemented now (9c61d63): just have to document everything properly, add a unit test and implement the spacing!

bcharlier · 2020-03-21T10:41:37Z

The problem seems to be solved. Feel free to re-open the issue if there is any question regarding band diagonal extraction.

awav · 2020-12-14T18:55:09Z

Hi, @bcharlier and @jeanfeydy. Are you planning to merge scattered_sum branch into the master and add it to the next release?

jeanfeydy · 2020-12-16T13:43:16Z

Hi @awav ,

Indeed! The main cause for this delay has been the CUDA 11 release this summer, which has introduced a significant regression on compilation times for KeOps formulas and made our continuous integration pipeline really impractical to run (from ~1 hour to ~5 hours per commit at some point in September...). In September-December, we have prioritized this issue and the release of our NeurIPS paper over the integration of new operations.
Fortunately, the conference is now over and @bcharlier and @joanglaunes have made a lot of progress on compilation times over the last few weeks. Once their two_stage_compile branch is integrated in master (with a x4-5 speed-up on compile times), I'll make sure to write unit tests for scattered_sum and add it to master :-)

Out of curiosity, could you maybe tell us a little bit more about your intended use case? I understand that you are mostly working with TensorFlow: if you believe that writing a KeOps binder for TF is do-able and worth doing, we'd be glad to get your opinion on this!

Best regards,
Jean

fschlatt · 2022-11-29T08:17:55Z

Hey @jeanfeydy ,

I'm trying to get the same diagonal band operation as @ibeltagy . @ibeltagy has in the meantime found a solution using plain pytorch by chunking matrices. This however uses more 2x more memory than necessary and requires padding the matrices to the window size. I've now tried to get the scattered_sum branch merged into the main branch. The code base has changed considerably in the past 2 years though 😄

Would you recommend using the old branch or have the changes brought considerable improvements? If so, what would the effort be for integrating the scattered sum reduction in the new codebase? Could you give me pointers on where to start?

Best regards

Ferdinand

fschlatt · 2022-11-29T09:26:41Z

I can the code vs the naive pytorch solution mentioned earlier, and it seems that keops is 10x slower, when I was expecting it to be 10x faster . The code and time estimates are below. Any suggestions what could be slowing it down? Thanks

from math import ceil
from pykeops.torch import LazyTensor
use_cuda = torch.cuda.is_available()
load = lambda x : x.cuda() if use_cuda else x
import time

# this function is called many times
def compute_mm_diagonals_keops(t1, t2, c, M, D, indices, ranges_i, slices_i, ranges_j):
  i = LazyTensor( indices[:,None,None] )
  j = LazyTensor( indices[None,:,None] )
  x = LazyTensor( t1[:,None,:] )
  y = LazyTensor( t2[None,:,:] )
  K_ij = (j - i + c).one_hot(2 * c + 1) * (x | y)  # "Kernel" matrix

  # N.B.: here, because the matrix is square, the pattern is symmetric for the transpose.
  K_ij.ranges = (ranges_i, slices_i, ranges_j, ranges_i, slices_i, ranges_j)
  r = K_ij.sum(dim=1)  # Actual computation
  return r

def compute_mm_diagonals_pytorch(t1, t2, c):
    seq_len = t1.size()[0]
    diagonals_list = []
    for i, d in enumerate(range(-c, c+1)):
        if d >= 0:
            slice_1 = t1.narrow(dim=0, start=0, length=seq_len - d)
            slice_2 = t2.narrow(dim=0, start=d, length=seq_len - d)
            pad_left = 0
            pad_right = d
        else:
            slice_1 = t1.narrow(dim=0, start=-d, length=seq_len + d)
            slice_2 = t2.narrow(dim=0, start=0, length=seq_len + d)
            pad_left = -d
            pad_right = 0
        diagonals_list.append(torch.nn.functional.pad((slice_1 * slice_2).sum(-1), pad=(pad_left, pad_right), value=0))
    r = torch.cat(diagonals_list, dim=-1).view(c * 2 + 1, seq_len).transpose(dim0=0, dim1=1)
    return r
  
# building the indices and slices happens only once
M, D, c = 16000, 64, 128
indices = load(torch.arange(M).float())
Mc = ceil(M / c)  # Number of blocks of size (c,c)
I = load(torch.arange(Mc).int())
# The lines below describe a block-sparse pattern of width
# 3 around the diagonal, with blocks of size (c,c):

# 1. "i" indices are grouped in clusters of size c:
ranges_i = torch.stack( (I * c, (I+1) * c) ).clamp(0, M).t().contiguous() # [[0, c], [c, 2*c], ...]
# 2. each "i" cluster is associated to one range over "j": 
slices_i = load(torch.arange(1, Mc + 1).int())  # [1, 2, ..., Mc]
# 3. the reduction ranges are given by [[0, 2*c], [0, 3*c], [c, 4*c], ...]:
ranges_j = torch.stack( ((I-1) * c, (I+2) * c) ).clamp(0, M).t().contiguous()

# call it once to compile 
t1 = load(torch.randn(M, D))
t2 = load(torch.randn(M, D))
r = compute_mm_diagonals_keops(t1, t2, c, M, D, indices, ranges_i, slices_i, ranges_j)

# call many times and measure time
time1 = 0
time2 = 0
for i in range(100):
  t1 = load(torch.randn(M, D))
  t2 = load(torch.randn(M, D))
  torch.cuda.synchronize()
  start = time.time()
  r = compute_mm_diagonals_keops(t1, t2, c, M, D, indices, ranges_i, slices_i, ranges_j)
  torch.cuda.synchronize()
  time1 += time.time() - start
  
  start = time.time()
  r2 = compute_mm_diagonals_pytorch(t1, t2, c)
  torch.cuda.synchronize()
  time2 += time.time() - start
  
  if not torch.allclose(r, r2, atol=1e-05):
    print('not equal')
print(f'Time keops: {time1} s')
print(f'Time pytorch: {time2} s')

Time keops: 18.393823623657227 s
Time pytorch: 1.8735544681549072 s

Additionally, when testing the benchmarking code above, I get a stack smashing detected error:

[KeOps] Warning : Cuda libraries were not detected on the system ; using cpu only mode
*** stack smashing detected ***: terminated
Aborted (core dumped)

jeanfeydy self-assigned this Sep 27, 2019

jeanfeydy mentioned this issue Oct 2, 2019

pytorch_scatter improvement #26

Closed

jeanfeydy added the enhancement New feature or request label Oct 2, 2019

jeanfeydy changed the title ~~Can I do this computation in keops ?~~ Extracting a band diagonal with KeOps? Oct 2, 2019

bcharlier closed this as completed Mar 21, 2020

This was referenced Jul 15, 2020

Shifted indices for LazyTensors #69

Open

Restricted domain for reduction? #74

Open

tkon3 mentioned this issue Nov 20, 2020

Question: use cases of ranges argument #118

Closed

awav mentioned this issue Dec 15, 2020

Q: How to add a scalar to a diagonal of a LazyTensor? #124

Open

jeanfeydy reopened this Dec 16, 2020

antoinediez mentioned this issue Aug 2, 2022

OneHot broken? #263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting a band diagonal with KeOps? #24

Extracting a band diagonal with KeOps? #24

ibeltagy commented Sep 26, 2019

ibeltagy commented Sep 26, 2019 •

edited

Loading

jeanfeydy commented Sep 26, 2019

ibeltagy commented Sep 26, 2019

ibeltagy commented Sep 26, 2019 •

edited

Loading

ibeltagy commented Sep 26, 2019

jeanfeydy commented Sep 27, 2019 •

edited

Loading

ibeltagy commented Sep 27, 2019

jeanfeydy commented Sep 27, 2019 •

edited

Loading

jeanfeydy commented Sep 27, 2019

ibeltagy commented Sep 27, 2019

jeanfeydy commented Sep 27, 2019

ibeltagy commented Sep 27, 2019

ibeltagy commented Sep 27, 2019 •

edited

Loading

jeanfeydy commented Sep 30, 2019

ibeltagy commented Oct 1, 2019

jeanfeydy commented Oct 1, 2019

jeanfeydy commented Oct 2, 2019

bcharlier commented Mar 21, 2020

awav commented Dec 14, 2020

jeanfeydy commented Dec 16, 2020

fschlatt commented Nov 29, 2022

fschlatt commented Nov 29, 2022

Extracting a band diagonal with KeOps? #24

Extracting a band diagonal with KeOps? #24

Comments

ibeltagy commented Sep 26, 2019

ibeltagy commented Sep 26, 2019 • edited Loading

jeanfeydy commented Sep 26, 2019

ibeltagy commented Sep 26, 2019

ibeltagy commented Sep 26, 2019 • edited Loading

ibeltagy commented Sep 26, 2019

jeanfeydy commented Sep 27, 2019 • edited Loading

ibeltagy commented Sep 27, 2019

jeanfeydy commented Sep 27, 2019 • edited Loading

jeanfeydy commented Sep 27, 2019

ibeltagy commented Sep 27, 2019

jeanfeydy commented Sep 27, 2019

ibeltagy commented Sep 27, 2019

ibeltagy commented Sep 27, 2019 • edited Loading

jeanfeydy commented Sep 30, 2019

ibeltagy commented Oct 1, 2019

jeanfeydy commented Oct 1, 2019

jeanfeydy commented Oct 2, 2019

bcharlier commented Mar 21, 2020

awav commented Dec 14, 2020

jeanfeydy commented Dec 16, 2020

fschlatt commented Nov 29, 2022

fschlatt commented Nov 29, 2022

ibeltagy commented Sep 26, 2019 •

edited

Loading

ibeltagy commented Sep 26, 2019 •

edited

Loading

jeanfeydy commented Sep 27, 2019 •

edited

Loading

jeanfeydy commented Sep 27, 2019 •

edited

Loading

ibeltagy commented Sep 27, 2019 •

edited

Loading