Research on L1 distance metric #12

vmarkovtsev · 2017-03-02T17:16:10Z

Hi!

This project kicks ass to our https://github.com/src-d/kmcuda to major extent. I think the best way for us to move on is to integrate the missing (for source{d}) parts to Faiss. Those are:

Proper K-means centroid initialization instead of random sampling. This includes some high level API improvement.
arccos over the scalar product aka angular / proper "cosine" distance. CUDA does not notice it in terms of performance.
Python 3 support (well, that should be easy with Swig).

If you agree with these, I will start making PRs. If you don't, I will have to incorporate Faiss inside kmcuda as the second non-free backend. Of course, I would prefer 1.

mdouze · 2017-03-02T17:56:35Z

Hi!

Thanks for your interest in Faiss. Could you consider the following points:

The k-means can be initialized with arbitrary centroids, when they are set on input in the Clustering object. Also, in our experiments with kmeans++ initialization, we have not found it to be more useful than random in large dimension and many points. What are the tasks you think it would be useful for?
There is a "spherical k-means" option in the Clustering object that normalizes the centroids at each iteration. Is this what you are referring to?
Python 3 is already supported (although the examples are written in Python 2).

vmarkovtsev · 2017-03-02T18:23:13Z

Sure, they can be imported. kmeans++ takes some time to finish and can be properly accelerated. We are using K-means for dimensionality reduction with large number of clusters, and kmeans++ shows substantial (up to 30%) faster convergence on our data.
Kind of, but it's not enough to normalize the vectors. The problem is that the angular distance works better (again in our experiments). The naked scalar product (that is, cos \alpha) is not a distance metric by definition, it does not satisfy the triangle inequality. https://en.wikipedia.org/wiki/Cosine_similarity
Swig generates python 2 native *.so by default and in that case I still have to manually edit the makefile. At least there must be py3 rule. I do understand that Facebook production is based on 2 but the rest of the folks have already switched. E.g. Python 3 compatibility? google/grumpy#1
_swigfaiss.so: undefined symbol: PyCObject_Type and friends.

jegou · 2017-03-02T22:46:26Z

Regarding 1.: Matthijs and I are the author of the Yael library (http://yael.gforge.inria.fr/, CPU library), which included a k-means++ initialization. Our observation was that, on all our experimental data, the initialization was actually costly and not BLAS3-friendly, even though we put some effort into its parallelization. As Matthijs mentioned, the gain from k-means++ was not worth this cost in our experiments (very small). For a fixed amount of time, we found that it was better (w.r.t. loss) to add a few iterations to the regular k-means, which is why random selection was the default choice in our Matlab interface.
But I understand that this does not have to be generally true, especially if you look at the data at convergence (which we normally don't do considering our typical parameters, i.e., many centroids and input points).

Regarding 2.: as mentioned by Matthijs, Faiss includes a spherical k-means, targeting to maximize \sum_i <q(x_i) | x_i>

Assignment step: On the sphere, cos=inner product. I assume that we agree that the use of arccos has not effect on the assignment of points to centroids because this function is monotonous: the closest centroid w.r.t. L2, or maximum cosine similarity, or maximum inner product are all equivalent on the sphere (with both the input vectors and centroids on the sphere).
Update step: Take the centroid that minimizes the l2 loss, i.e., the mean of all vectors assigned to the centroid, and project it back on the sphere.

So, this is the algorithm described (Figure 1) in
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.111.8125&rep=rep1&type=pdf

We are aware that there are some variants of this algorithm, including one I co-authored that modifies the cost because the objective function is different (see Section 6.5 in https://arxiv.org/pdf/1412.3328.pdf). To which algorithm do you refer?

Thanks for your feedback!

mdouze · 2017-03-03T06:53:20Z

Hi @vmarkovtsev,

Indeed the port to Python 3 was not complete. I added instructions to the INSTALL and makefile.inc.Linux on how to compile for py3. Thanks for pointing this out.

There are three possible levels of integration of other kmeans algorithms:

re-use the Clustering and input another initialization
re-use the Index object and the km_update_centroids function that does the M-step
re-use only the Index object for fast assignent.

At which level do you think your changes to the clustering could be intergated?

Thanks

vmarkovtsev · 2017-03-03T07:28:24Z

@jegou,

I don't know what to say... As I said, we observed substantial (subjective) achieved quality and convergence improvements. I guess, if K-means iterations are blazingly fast, it can indeed be more profitable to simply add more iterations instead of improving the initialization to get the same metrics. Our implementation is slower, and it made sense to run kmeans++. Do you strictly target only very large uniformly distributed datasets? I could prepare a notebook to show what I mean in some future.
Correct, I am referring to exactly the described algorithm. I didn't read this paper but it is implemented word-to-word in our library, albeit we take arccos. Indeed, since arccos in monotonous, we can swap arccos(a \cdot b) with -(a \cdot b).

@mdouze,
I forgot that I've also got L1 besides L2 and angular, though somewhat not ready for production. As I believe, L1 has nothing to do with inner products. I looked through the code and thought that L1 and L2 mean the other thing there (1-D and 2-D input). Am I right?

I guess my proposed additions in (1) fall in (1).

jegou · 2017-03-03T14:05:58Z

@vmarkovtsev
We do not target uniformly datasets but real-life features (large ones). At the time we made the test with k-means++, we had mostly experimented with various features extracted from images (SIFT, GIST, etc), such as those provided in the following benchmarks (widely used in indexing for approximate search benchmarking):
http://corpus-texmex.irisa.fr/

If you have done some experiments with k-means++ vs random init (large-scale, we are not much interested by dataset comprising less than 1 million input vectors and less than 10k centroids), I am of course curious of the results.
One technical point: one of the strength of k-means++ is to avoid random clusters in the first iterations. Random init is bad by default for spiky distributions, but this is easily compensated by detecting the case and employing a smart (and costless) splitting strategy, which we do in Faiss. I assume that, in the comparison between k-means++ and random init, that you carefully handle this point, otherwise IMO the comparison would not be conclusive.

vmarkovtsev · 2017-03-03T14:09:59Z

I see, thanks. Our datasets are somewhat NLP-ish, https://data.world/vmarkovtsev are the examples.

wickedfoo · 2017-03-03T17:09:25Z

Any implementation on the GPU aiming for pairwise L1 distance will likely be a lot slower, since there would need to be an equivalent to the GEMM kernel that calculates d_mn = sum_k |x_mk - y_kn| instead of d_mn = sum_k (x_mk * y_kn). The GEMM strategy is efficient because there is high memory reuse in the register file and shared memory via tiling.

However, cuBLAS GEMM itself is quite inefficient for small k (k being the dimension of the vectors). Full arithmetic throughput is only seen for really large k. Small k is also in the regime where GEMM starts to be memory bandwidth / latency bound rather than arithmetically bound.

A kernel could probably be written for L1 using a GEMM-like strategy, but it would take a lot of time to get right or tune appropriately for different architectures.

jegou · 2017-03-03T17:33:30Z

@vmarkovtsev: L1 is not derivable from an inner product.
However, many other kernels such as the intersection kernel (often good replacement for L1) can be approximated with explicit embeddings, a strategy which is more and more popular and you may be aware of. This can be done by Nystrom-like methods / KPCA. Personally, I am a big fan of the ones proposed by Andrea Vedaldi:
https://www.robots.ox.ac.uk/~vedaldi/assets/pubs/vedaldi11efficient.pdf

Employing such embeddings, you can map a vector x->phi(x), such that the inner product phi(x)^T phi(y) approximates your kernel k(x,y). Then you can simply search/cluster based on the vectors phi(x), for instance for a search application such as proposed in
https://hal.inria.fr/hal-00722635v2/document

vmarkovtsev · 2017-03-06T10:07:27Z

Thank you everybody!

So, to summarize my initial 3 points.

As soon as I provide comprehensive evidence that kmeans++ improves upon random init on datasets with >1M points and 10k centroids, maintainers are glad to allow the contribution. I doubt that I provide this evidence for 10k centroids, as my own experience was based on 8M samples (OK) and 1-2k centroids (not OK). So I revoke this.
Cosine distance can be calculated without taking arccos and not losing any precision, so initial (2) is revoked. The alternative proposal of L1 is interesting to maintainers but it will take much work. I preserve this in the hope that I find the time to experiment with it (after all, I almost finished the impl in kmcuda). I need time to read these fancy stuff about Nystrom-like methods / KPCA to understand them properly.
Python3 support was added as the result of the discussion - closed.

Thus I am renaming the topic to make it better reflect the intent.

wickedfoo · 2019-05-31T06:13:28Z

Duplicate of #848

merging into a more general task, might support some of this for GPU over the next couple of months

Multi GPU support

vmarkovtsev changed the title ~~Merge with src-d/kmcuda~~ Research on L1 distance metric Mar 6, 2017

vmarkovtsev mentioned this issue Mar 6, 2017

Speedup pysparnn with CUDA facebookresearch/pysparnn#10

Closed

mdouze added the enhancement label Mar 10, 2017

Geek0x0 mentioned this issue Apr 5, 2017

running demo_ivfpq_indexing_gpu Segmentation fault #67

Closed

mdouze mentioned this issue Jan 16, 2018

faiss MetricType support L1 distance or fractional distance metric Lp(p=0.5) ? #313

Closed

This was referenced May 9, 2018

GPU Search billion vectors failed #379

Closed

Fails to add vectors to GpuIndexIVFPQ #440

Closed

q423462798 mentioned this issue May 30, 2018

infinite loop when clustering #463

Closed

ZhuoranLyu mentioned this issue Jun 7, 2018

Fails to search topk vectors among 20000 same vectors #484

Closed

1 task

wickedfoo closed this as completed May 31, 2019

fenglonz mentioned this issue Dec 3, 2019

coredump in knn_inner_product_blas::sgemm_ #1051

Closed

zhangqiang4002 mentioned this issue Feb 2, 2021

Coredump in IndexHNSW #1662

Closed

mqnfred pushed a commit to mqnfred/faiss that referenced this issue Oct 23, 2023

Merge pull request facebookresearch#12 from Enet4/multi-gpu

c872866

Multi GPU support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research on L1 distance metric #12

Research on L1 distance metric #12

vmarkovtsev commented Mar 2, 2017

mdouze commented Mar 2, 2017

vmarkovtsev commented Mar 2, 2017 •

edited

Loading

jegou commented Mar 2, 2017

mdouze commented Mar 3, 2017

vmarkovtsev commented Mar 3, 2017

jegou commented Mar 3, 2017 •

edited

Loading

vmarkovtsev commented Mar 3, 2017

wickedfoo commented Mar 3, 2017

jegou commented Mar 3, 2017 •

edited

Loading

vmarkovtsev commented Mar 6, 2017

wickedfoo commented May 31, 2019

Research on L1 distance metric #12

Research on L1 distance metric #12

Comments

vmarkovtsev commented Mar 2, 2017

mdouze commented Mar 2, 2017

vmarkovtsev commented Mar 2, 2017 • edited Loading

jegou commented Mar 2, 2017

mdouze commented Mar 3, 2017

vmarkovtsev commented Mar 3, 2017

jegou commented Mar 3, 2017 • edited Loading

vmarkovtsev commented Mar 3, 2017

wickedfoo commented Mar 3, 2017

jegou commented Mar 3, 2017 • edited Loading

vmarkovtsev commented Mar 6, 2017

wickedfoo commented May 31, 2019

vmarkovtsev commented Mar 2, 2017 •

edited

Loading

jegou commented Mar 3, 2017 •

edited

Loading

jegou commented Mar 3, 2017 •

edited

Loading