Skip to content

Commit

Permalink
PERF: pyemd to POT for EMD computation in wmdistance (#3327)
Browse files Browse the repository at this point in the history
* PERF: switch from pyemd to POT for EMD computation

* Adapt citations

* Adapt dependency

* Adapt tests

* Update cache for gallery

Co-authored-by: TLouf <[email protected]>
  • Loading branch information
TLouf and TLouf committed Nov 3, 2022
1 parent a435f24 commit fdf40eb
Show file tree
Hide file tree
Showing 32 changed files with 866 additions and 370 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ env:
# them here for now. They'll get picked up by the multibuild stuff
# running in multibuild/common_utils.sh.
#
- TEST_DEPENDS="pytest mock cython nmslib pyemd testfixtures python-levenshtein==0.12.0 visdom==0.1.8.9 scikit-learn"
- TEST_DEPENDS="pytest mock cython nmslib POT testfixtures python-levenshtein==0.12.0 visdom==0.1.8.9 scikit-learn"

matrix:
#
Expand Down
5 changes: 2 additions & 3 deletions docs/notebooks/WMD_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"\n",
"## Running this notebook\n",
"\n",
"You can download this [iPython Notebook](http://ipython.org/notebook.html), and run it on your own computer, provided you have installed Gensim, PyEMD, NLTK, and downloaded the necessary data.\n",
"You can download this [iPython Notebook](http://ipython.org/notebook.html), and run it on your own computer, provided you have installed Gensim, POT, NLTK, and downloaded the necessary data.\n",
"\n",
"The notebook was run on an Ubuntu machine with an Intel core i7-4770 CPU 3.40GHz (8 cores) and 32 GB memory. Running the entire notebook on this machine takes about 3 minutes.\n",
"\n",
Expand Down Expand Up @@ -524,8 +524,7 @@
"source": [
"## References\n",
"\n",
"1. Ofir Pele and Michael Werman, *A linear time histogram metric for improved SIFT matching*, 2008.\n",
"* Ofir Pele and Michael Werman, *Fast and robust earth mover's distances*, 2009.\n",
"1. * Rémi Flamary et al. *POT: Python Optimal Transport*, 2021.\n",
"* Matt Kusner et al. *From Embeddings To Document Distances*, 2015.\n",
"* Thomas Mikolov et al. *Efficient Estimation of Word Representations in Vector Space*, 2013."
]
Expand Down
46 changes: 23 additions & 23 deletions docs/notebooks/soft_cosine_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
">\n",
"\n",
"## Running this notebook\n",
"You can download this [Jupyter notebook](http://jupyter.org/), and run it on your own computer, provided you have installed the `gensim`, `jupyter`, `sklearn`, `pyemd`, and `wmd` Python packages.\n",
"You can download this [Jupyter notebook](http://jupyter.org/), and run it on your own computer, provided you have installed the `gensim`, `jupyter`, `sklearn`, `POT`, and `wmd` Python packages.\n",
"\n",
"The notebook was run on an Ubuntu machine with an Intel core i7-6700HQ CPU 3.10GHz (4 cores) and 16 GB memory. Assuming all resources required by the notebook have already been downloaded, running the entire notebook on this machine takes about 30 minutes."
]
Expand Down Expand Up @@ -357,7 +357,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install pyemd"
"!pip install POT"
]
},
{
Expand Down Expand Up @@ -404,7 +404,7 @@
" return similarities\n",
"\n",
"def wmd_gensim(query, documents):\n",
" # Compute Word Mover's Distance as implemented in PyEMD by William Mayner\n",
" # Compute Word Mover's Distance as implemented in POT\n",
" # between the query and the documents.\n",
" index = WmdSimilarity(documents, w2v_model)\n",
" similarities = index[query]\n",
Expand Down Expand Up @@ -532,26 +532,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Dataset | Strategy | MAP score | Elapsed time (sec)\n",
":---|:---|:---|---:\n",
"2016-test|softcossim|78.52 ±11.18|6.00 ±0.79\n",
"2016-test|**Winner (UH-PRHLT-primary)**|76.70 ±0.00|\n",
"2016-test|cossim|76.45 ±10.40|0.64 ±0.08\n",
"2016-test|wmd-gensim|76.23 ±11.42|5.37 ±0.64\n",
"2016-test|**Baseline 1 (IR)**|74.75 ±0.00|\n",
"2016-test|wmd-relax|71.05 ±11.06|1.11 ±0.09\n",
"2016-test|**Baseline 2 (random)**|46.98 ±0.00|\n",
"\n",
"\n",
"Dataset | Strategy | MAP score | Elapsed time (sec)\n",
":---|:---|:---|---:\n",
"2017-test|**Winner (SimBow-primary)**|47.22 ±0.00|\n",
"2017-test|softcossim|45.88 ±16.22|7.08 ±1.49\n",
"2017-test|cossim|44.38 ±14.71|0.74 ±0.10\n",
"2017-test|wmd-gensim|44.06 ±15.92|6.20 ±0.87\n",
"2017-test|wmd-relax|43.52 ±16.30|1.30 ±0.18\n",
"2017-test|**Baseline 1 (IR)**|41.85 ±0.00|\n",
"2017-test|**Baseline 2 (random)**|29.81 ±0.00|"
"Dataset | Strategy | MAP score | Elapsed time (sec)\n",
":---|:---|:---|---:\n",
"2016-test|softcossim|78.52 ±11.18|6.00 ±0.79\n",
"2016-test|**Winner (UH-PRHLT-primary)**|76.70 ±0.00|\n",
"2016-test|cossim|76.45 ±10.40|0.64 ±0.08\n",
"2016-test|wmd-gensim|76.23 ±11.42|5.37 ±0.64\n",
"2016-test|**Baseline 1 (IR)**|74.75 ±0.00|\n",
"2016-test|wmd-relax|71.05 ±11.06|1.11 ±0.09\n",
"2016-test|**Baseline 2 (random)**|46.98 ±0.00|\n",
"\n",
"\n",
"Dataset | Strategy | MAP score | Elapsed time (sec)\n",
":---|:---|:---|---:\n",
"2017-test|**Winner (SimBow-primary)**|47.22 ±0.00|\n",
"2017-test|softcossim|45.88 ±16.22|7.08 ±1.49\n",
"2017-test|cossim|44.38 ±14.71|0.74 ±0.10\n",
"2017-test|wmd-gensim|44.06 ±15.92|6.20 ±0.87\n",
"2017-test|wmd-relax|43.52 ±16.30|1.30 ±0.18\n",
"2017-test|**Baseline 1 (IR)**|41.85 ±0.00|\n",
"2017-test|**Baseline 2 (random)**|29.81 ±0.00|"
]
},
{
Expand Down
98 changes: 98 additions & 0 deletions docs/src/auto_examples/core/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@


.. _sphx_glr_auto_examples_core:

Core Tutorials: New Users Start Here!
-------------------------------------

If you're new to gensim, we recommend going through all core tutorials in order.
Understanding this functionality is vital for using gensim effectively.



.. raw:: html

<div class="sphx-glr-thumbnails">


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="This tutorial introduces Documents, Corpora, Vectors and Models: the basic concepts and terms n...">

.. only:: html

.. image:: /auto_examples/core/images/thumb/sphx_glr_run_core_concepts_thumb.png
:alt: Core Concepts

:ref:`sphx_glr_auto_examples_core_run_core_concepts.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">Core Concepts</div>
</div>


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates transforming text into a vector space representation.">

.. only:: html

.. image:: /auto_examples/core/images/thumb/sphx_glr_run_corpora_and_vector_spaces_thumb.png
:alt: Corpora and Vector Spaces

:ref:`sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">Corpora and Vector Spaces</div>
</div>


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Introduces transformations and demonstrates their use on a toy corpus.">

.. only:: html

.. image:: /auto_examples/core/images/thumb/sphx_glr_run_topics_and_transformations_thumb.png
:alt: Topics and Transformations

:ref:`sphx_glr_auto_examples_core_run_topics_and_transformations.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">Topics and Transformations</div>
</div>


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates querying a corpus for similar documents.">

.. only:: html

.. image:: /auto_examples/core/images/thumb/sphx_glr_run_similarity_queries_thumb.png
:alt: Similarity Queries

:ref:`sphx_glr_auto_examples_core_run_similarity_queries.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">Similarity Queries</div>
</div>


.. raw:: html

</div>


.. toctree::
:hidden:

/auto_examples/core/run_core_concepts
/auto_examples/core/run_corpora_and_vector_spaces
/auto_examples/core/run_topics_and_transformations
/auto_examples/core/run_similarity_queries

97 changes: 97 additions & 0 deletions docs/src/auto_examples/howtos/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@


.. _sphx_glr_auto_examples_howtos:

How-to Guides: Solve a Problem
------------------------------

These **goal-oriented guides** demonstrate how to **solve a specific problem** using gensim.



.. raw:: html

<div class="sphx-glr-thumbnails">


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates simple and quick access to common corpora and pretrained models.">

.. only:: html

.. image:: /auto_examples/howtos/images/thumb/sphx_glr_run_downloader_api_thumb.png
:alt: How to download pre-trained models and corpora

:ref:`sphx_glr_auto_examples_howtos_run_downloader_api.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">How to download pre-trained models and corpora</div>
</div>


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="How to author documentation for Gensim.">

.. only:: html

.. image:: /auto_examples/howtos/images/thumb/sphx_glr_run_doc_thumb.png
:alt: How to Author Gensim Documentation

:ref:`sphx_glr_auto_examples_howtos_run_doc.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">How to Author Gensim Documentation</div>
</div>


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Shows how to reproduce results of the &quot;Distributed Representation of Sentences and Documents&quot; p...">

.. only:: html

.. image:: /auto_examples/howtos/images/thumb/sphx_glr_run_doc2vec_imdb_thumb.png
:alt: How to reproduce the doc2vec 'Paragraph Vector' paper

:ref:`sphx_glr_auto_examples_howtos_run_doc2vec_imdb.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">How to reproduce the doc2vec 'Paragraph Vector' paper</div>
</div>


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates how you can visualize and compare trained topic models.">

.. only:: html

.. image:: /auto_examples/howtos/images/thumb/sphx_glr_run_compare_lda_thumb.png
:alt: How to Compare LDA Models

:ref:`sphx_glr_auto_examples_howtos_run_compare_lda.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">How to Compare LDA Models</div>
</div>


.. raw:: html

</div>


.. toctree::
:hidden:

/auto_examples/howtos/run_downloader_api
/auto_examples/howtos/run_doc
/auto_examples/howtos/run_doc2vec_imdb
/auto_examples/howtos/run_compare_lda

24 changes: 12 additions & 12 deletions docs/src/auto_examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,35 +152,35 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s EnsembleLda model">
<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s fastText model and demonstrates its use on the Lee Corpus.">

.. only:: html

.. image:: /auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png
:alt: Ensemble LDA
.. image:: /auto_examples/tutorials/images/thumb/sphx_glr_run_fasttext_thumb.png
:alt: FastText Model

:ref:`sphx_glr_auto_examples_tutorials_run_ensemblelda.py`
:ref:`sphx_glr_auto_examples_tutorials_run_fasttext.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">Ensemble LDA</div>
<div class="sphx-glr-thumbnail-title">FastText Model</div>
</div>


.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s fastText model and demonstrates its use on the Lee Corpus.">
<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s EnsembleLda model">

.. only:: html

.. image:: /auto_examples/tutorials/images/thumb/sphx_glr_run_fasttext_thumb.png
:alt: FastText Model
.. image:: /auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png
:alt: Ensemble LDA

:ref:`sphx_glr_auto_examples_tutorials_run_fasttext.py`
:ref:`sphx_glr_auto_examples_tutorials_run_ensemblelda.py`

.. raw:: html

<div class="sphx-glr-thumbnail-title">FastText Model</div>
<div class="sphx-glr-thumbnail-title">Ensemble LDA</div>
</div>


Expand Down Expand Up @@ -220,7 +220,7 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates using Gensim&#x27;s implemenation of the SCM.">
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates using Gensim&#x27;s implemenation of the WMD.">

.. only:: html

Expand All @@ -237,7 +237,7 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates using Gensim&#x27;s implemenation of the WMD.">
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates using Gensim&#x27;s implemenation of the SCM.">

.. only:: html

Expand Down
Loading

0 comments on commit fdf40eb

Please sign in to comment.