piskvorky · menshikh-iv · Jul 31, 2018 · Jul 6, 2018
diff --git a/docs/src/_index.rst.unused b/docs/src/_index.rst.unused
@@ -0,0 +1,100 @@
+
+:github_url: https://github.com/RaRe-Technologies/gensim
+
+Gensim documentation
+===================================
+
+============
+Introduction
+============
+
+Gensim is a free Python library designed to automatically extract semantic
+topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.
+
+Gensim is designed to process raw, unstructured digital texts ("plain text").
+
+The algorithms in Gensim, such as **Word2Vec**, **FastText**, **Latent Semantic Analysis**, **Latent Dirichlet Allocation** and **Random Projections**, discover semantic structure of documents by examining statistical co-occurrence patterns within a corpus of training documents. These algorithms are **unsupervised**, which means no human input is necessary -- you only need a corpus of plain text documents.
+
+Once these statistical patterns are found, any plain text documents can be succinctly
+expressed in the new, semantic representation and queried for topical similarity
+against other documents, words or phrases.
+
+.. note::
+   If the previous paragraphs left you confused, you can read more about the `Vector
+   Space Model <http://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
+   document analysis <http://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.
+
+
+.. _design:
+
+Features
+--------
+
+* **Memory independence** -- there is no need for the whole training corpus to
+  reside fully in RAM at any one time (can process large, web-scale corpora).
+* **Memory sharing** -- trained models can be persisted to disk and loaded back via mmap. Multiple processes can share the same data, cutting down RAM footprint.
+* Efficient implementations for several popular vector space algorithms,
+  including Word2Vec, Doc2Vec, FastText, TF-IDF, Latent Semantic Analysis (LSI, LSA),
+  Latent Dirichlet Allocation (LDA) or Random Projection.
+* I/O wrappers and readers from several popular data formats.
+* Fast similarity queries for documents in their semantic representation.
+
+The **principal design objectives** behind Gensim are:
+
+1. Straightforward interfaces and low API learning curve for developers. Good for prototyping.
+2. Memory independence with respect to the size of the input corpus; all intermediate
+   steps and algorithms operate in a streaming fashion, accessing one document
+   at a time.
+
+.. seealso::
+
+    We built a high performance server for NLP, document analysis, indexing, search and clustering: https://scaletext.ai.
+    ScaleText is a commercial product, available both on-prem or as SaaS.
+    Reach out at [email protected] if you need an industry-grade tool with professional support.
+
+.. _availability:
+
+Availability
+------------
+
+Gensim is licensed under the OSI-approved `GNU LGPLv2.1 license <http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html>`_ and can be downloaded either from its `github repository <https://github.com/piskvorky/gensim/>`_ or from the `Python Package Index <http://pypi.python.org/pypi/gensim>`_.
+
+.. seealso::
+
+    See the :doc:`install <install>` page for more info on Gensim deployment.
+
+
+.. toctree::
+   :glob:
+   :maxdepth: 1
+   :caption: Getting started
+
+   install
+   intro
+   support
+   about
+   license
+   citing
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Tutorials
+
+   tutorial
+   tut1
+   tut2
+   tut3
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: API Reference
+
+   apiref
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
diff --git a/docs/src/_license.rst.unused b/docs/src/_license.rst.unused
@@ -0,0 +1,26 @@
+:orphan:
+
+.. _license:
+
+Licensing
+---------
+
+Gensim is licensed under the OSI-approved `GNU LGPLv2.1 license <http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html>`_.
+
+This means that it's free for both personal and commercial use, but if you make any
+modification to Gensim that you distribute to other people, you have to disclose
+the source code of these modifications.
+
+Apart from that, you are free to redistribute Gensim in any way you like, though you're
+not allowed to modify its license (doh!).
+
+My intent here is to **get more help and community involvement** with the development of Gensim.
+The legalese is therefore less important to me than your input and contributions.
+
+`Contact me <mailto:[email protected]>`_ if LGPL doesn't fit your bill but you'd like the LGPL restrictions liften.
+
+.. seealso::
+
+    We built a high performance server for NLP, document analysis, indexing, search and clustering: https://scaletext.ai.
+    ScaleText is a commercial product, available both on-prem or as SaaS.
+    Reach out at [email protected] if you need an industry-grade tool with professional support.
diff --git a/gensim/models/doc2vec.py b/gensim/models/doc2vec.py
@@ -20,21 +20,21 @@
 <https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb>`_.
 
 **Make sure you have a C compiler before installing Gensim, to use the optimized doc2vec routines** (70x speedup
-compared to plain NumPy implementation <https://rare-technologies.com/parallelizing-word2vec-in-python/>`_).
+compared to plain NumPy implementation, https://rare-technologies.com/parallelizing-word2vec-in-python/).
 
 
-Examples
---------
+Usage examples
+==============
 
-Initialize & train a model
+Initialize & train a model:
 
 >>> from gensim.test.utils import common_texts
 >>> from gensim.models.doc2vec import Doc2Vec, TaggedDocument
 >>>
 >>> documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(common_texts)]
 >>> model = Doc2Vec(documents, vector_size=5, window=2, min_count=1, workers=4)
 
-Persist a model to disk
+Persist a model to disk:
 
 >>> from gensim.test.utils import get_tmpfile
 >>>
@@ -43,11 +43,11 @@
 >>> model.save(fname)
 >>> model = Doc2Vec.load(fname)  # you can continue training with the loaded model!
 
-If you're finished training a model (=no more updates, only querying, reduce memory usage), you can do
+If you're finished training a model (=no more updates, only querying, reduce memory usage), you can do:
 
 >>> model.delete_temporary_training_data(keep_doctags_vectors=True, keep_inference=True)
 
-Infer vector for new document
+Infer vector for a new document:
 
 >>> vector = model.infer_vector(["system", "response"])
 

diff --git a/gensim/models/fasttext.py b/gensim/models/fasttext.py
@@ -13,6 +13,7 @@
 
 This module contains a fast native C implementation of Fasttext with Python interfaces. It is **not** only a wrapper
 around Facebook's implementation.
+
 For a tutorial see `this noteboook
 <https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/FastText_Tutorial.ipynb>`_.
 
@@ -22,14 +23,14 @@
 Usage examples
 --------------
 
-Initialize and train a model
+Initialize and train a model:
 
 >>> from gensim.test.utils import common_texts
 >>> from gensim.models import FastText
 >>>
 >>> model = FastText(common_texts, size=4, window=3, min_count=1, iter=10)
 
-Persist a model to disk with
+Persist a model to disk with:
 
 >>> from gensim.test.utils import get_tmpfile
 >>>
@@ -38,7 +39,7 @@
 >>> model.save(fname)
 >>> model = FastText.load(fname)  # you can continue training with the loaded model!
 
-Retrieve word-vector for vocab and out-of-vocab word
+Retrieve word-vector for vocab and out-of-vocab word:
 
 >>> existent_word = "computer"
 >>> existent_word in model.wv.vocab
@@ -50,7 +51,7 @@
 False
 >>> oov_vec = model.wv[oov_word]  # numpy vector for OOV word
 
-You can perform various NLP word tasks with the model, some of them are already built-in
+You can perform various NLP word tasks with the model, some of them are already built-in:
 
 >>> similarities = model.wv.most_similar(positive=['computer', 'human'], negative=['interface'])
 >>> most_similar = similarities[0]
@@ -62,13 +63,13 @@
 >>>
 >>> sim_score = model.wv.similarity('computer', 'human')
 
-Correlation with human opinion on word similarity
+Correlation with human opinion on word similarity:
 
 >>> from gensim.test.utils import datapath
 >>>
 >>> similarities = model.wv.evaluate_word_pairs(datapath('wordsim353.tsv'))
 
-And on word analogies
+And on word analogies:
 
 >>> analogies_result = model.wv.accuracy(datapath('questions-words.txt'))
 

diff --git a/gensim/models/word2vec.py b/gensim/models/word2vec.py
@@ -27,12 +27,12 @@
 visit https://rare-technologies.com/word2vec-tutorial/.
 
 **Make sure you have a C compiler before installing Gensim, to use the optimized word2vec routines**
-(70x speedup compared to plain NumPy implementation, https://rare-technologies.com/parallelizing-word2vec-in-python/.
+(70x speedup compared to plain NumPy implementation, https://rare-technologies.com/parallelizing-word2vec-in-python/).
 
 Usage examples
 ==============
 
-Initialize a model with e.g.
+Initialize a model with e.g.:
 
 >>> from gensim.test.utils import common_texts, get_tmpfile
 >>> from gensim.models import Word2Vec
@@ -45,13 +45,13 @@
 The training is streamed, meaning `sentences` can be a generator, reading input data
 from disk on-the-fly, without loading the entire corpus into RAM.
 
-It also means you can continue training the model later
+It also means you can continue training the model later:
 
 >>> model = Word2Vec.load("word2vec.model")
 >>> model.train([["hello", "world"]], total_examples=1, epochs=1)
 (0, 2)
 
-The trained word vectors are stored in a :class:`~gensim.models.KeyedVectors` instance in `model.wv`:
+The trained word vectors are stored in a :class:`~gensim.models.keyedvectors.KeyedVectors` instance in `model.wv`:
 
 >>> vector = model.wv['computer']  # numpy vector of a word
 
@@ -68,7 +68,8 @@
 >>> wv = KeyedVectors.load("model.wv", mmap='r')
 >>> vector = wv['computer']  # numpy vector of a word
 
-Gensim can also load word vectors in the "word2vec C format", as this :class:`~gensim.models.KeyedVectors` instance::
+Gensim can also load word vectors in the "word2vec C format", as a
+:class:`~gensim.models.keyedvectors.KeyedVectors` instance::
 
 >>> from gensim.test.utils import datapath
 >>>
@@ -84,7 +85,7 @@
 are already built-in - you can see it in :mod:`gensim.models.keyedvectors`.
 
 If you're finished training a model (i.e. no more updates, only querying),
-you can switch to the :class:`~gensim.models.KeyedVectors` instance
+you can switch to the :class:`~gensim.models.keyedvectors.KeyedVectors` instance:
 
 >>> word_vectors = model.wv
 >>> del model