quickwit-oss · cjrh · Aug 4, 2023 · Jul 21, 2023
diff --git a/docs/about.md b/docs/about.md
@@ -0,0 +1 @@
+# About
diff --git a/docs/explanation.md b/docs/explanation.md
@@ -0,0 +1 @@
+# Explanation
diff --git a/docs/howto.md b/docs/howto.md
@@ -0,0 +1,46 @@
+# How-to Guides
+
+## Installation
+
+tantivy-py can be installed using from [pypi](pypi.org) using pip:
+
+    pip install tantivy
+
+If no binary wheel is present for your operating system the bindings will be
+build from source, this means that Rust needs to be installed before building
+can succeed.
+
+Note that the bindings are using [PyO3](https://github.com/PyO3/pyo3), which
+only supports python3.
+
+## Set up a development environment to work on tantivy-py itself
+
+Setting up a development environment can be done in a virtual environment using
+[`nox`](https://nox.thea.codes) or using local packages using the provided `Makefile`.
+
+For the `nox` setup install the virtual environment and build the bindings using:
+
+    python3 -m pip install nox
+    nox
+
+For the `Makefile` based setup run:
+
+    make
+
+Running the tests is done using:
+
+    make test
+
+## Working on tantivy-py documentation
+
+Please be aware that this documentation is structured using the [Diátaxis](https://diataxis.fr/) framework. In very simple terms, this framework will suggest the correct location for different kinds of documentation. Please make sure you gain a basic understanding of the goals of the framework before making large pull requests with new documentation.
+
+This documentation uses the [MkDocs](https://mkdocs.readthedocs.io/en/stable/) framework. This package is specified as an optional dependency in the `pyproject.toml` file. To install all optional dev dependencies into your virtual env, run the following command:
+
+    pip install .[dev]
+
+The [MkDocs](https://mkdocs.readthedocs.io/en/stable/) documentation itself is comprehensive. MkDocs provides some additional context and help around [writing with markdown](https://mkdocs.readthedocs.io/en/stable/user-guide/writing-your-docs/#writing-with-markdown).
+
+If all you want to do is make a few edits right away, the documentation content is in the `/docs` directory and consists of [Markdown](https://www.markdownguide.org/) files, which can be edited with any text editor.
+
+The most efficient way to work is to run a MkDocs livereload server in the background. This will launch a local web server on your dev machine, serve the docs (by default at `http://localhost:8000`), and automatically reload the page after you save any changes to the documentation files.
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,22 @@
+# Welcome to tantivy-py
+
+tantivy-py is a wrapper for the [tantivy](https://github.com/quickwit-oss/tantivy) full-text search engine, which is inspired by Apache Lucene. 
+
+tantivy-py is [licensed](https://github.com/quickwit-oss/tantivy-py/blob/master/LICENSE) under the [MIT License](https://www.tldrlegal.com/license/mit-license).
+
+## Important links
+
+- [tantivy-py code repository](https://github.com/quickwit-oss/tantivy-py)
+- [tantivy code repository](https://github.com/quickwit-oss/tantivy)
+- [tantivy Documentation](https://docs.rs/crate/tantivy/latest)
+- [tantivy query language](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.parse_query)
+
+## How to use this documentation
+
+This documentation uses the [Diátaxis](https://diataxis.fr/) framework. The following sections are clearly separated:
+
+- [Tutorials](tutorials.md): when you want to learn
+- [How-to Guides](howto.md): when need to accomplish a task
+- [Explanation](howto.md): when you need a broader understanding and the thinking behind why certain things are set up in a particular way.
+- [Reference](reference.md): when you need precise, detailed information
+
diff --git a/docs/reference.md b/docs/reference.md
@@ -0,0 +1,38 @@
+# Reference
+
+## Valid Query Formats
+
+tantivy-py supports the [query language](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.parse_query) used in tantivy.
+Below a few basic query formats are shown:
+
+ - AND and OR conjunctions.
+```python
+query = index.parse_query('(Old AND Man) OR Stream', ["title", "body"])
+(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
+best_doc = searcher.doc(best_doc_address)
+```
+
+ - +(includes) and -(excludes) operators.
+```python
+query = index.parse_query('+Old +Man chef -fished', ["title", "body"])
+(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
+best_doc = searcher.doc(best_doc_address)
+```
+Note: in a query like above, a word with no +/- acts like an OR.
+
+ - phrase search.
+```python
+query = index.parse_query('"eighty-four days"', ["title", "body"])
+(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
+best_doc = searcher.doc(best_doc_address)
+```
+
+- integer search
+```python
+query = index.parse_query('"eighty-four days"', ["doc_id"])
+(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
+best_doc = searcher.doc(best_doc_address)
+```
+Note: for integer search, the integer field should be indexed.
+
+For more possible query formats and possible query options, see [Tantivy Query Parser Docs.](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html)
diff --git a/docs/tutorials.md b/docs/tutorials.md
@@ -0,0 +1,82 @@
+# Tutorials
+
+## Building an index and populating it
+
+```python
+import tantivy
+
+# Declaring our schema.
+schema_builder = tantivy.SchemaBuilder()
+schema_builder.add_text_field("title", stored=True)
+schema_builder.add_text_field("body", stored=True)
+schema_builder.add_integer_field("doc_id",stored=True)
+schema = schema_builder.build()
+
+# Creating our index (in memory)
+index = tantivy.Index(schema)
+```
+
+To have a persistent index, use the path
+parameter to store the index on the disk, e.g:
+
+```python
+index = tantivy.Index(schema, path=os.getcwd() + '/index')
+```
+
+By default, tantivy  offers the following tokenizers
+which can be used in tantivy-py:
+ -  `default`
+`default` is the tokenizer that will be used if you do not
+ assign a specific tokenizer to your text field.
+ It will chop your text on punctuation and whitespaces,
+ removes tokens that are longer than 40 chars, and lowercase your text.
+
+-  `raw`
+ Does not actual tokenizer your text. It keeps it entirely unprocessed.
+ It can be useful to index uuids, or urls for instance.
+
+-  `en_stem`
+
+ In addition to what `default` does, the `en_stem` tokenizer also
+ apply stemming to your tokens. Stemming consists in trimming words to
+ remove their inflection. This tokenizer is slower than the default one,
+ but is recommended to improve recall.
+
+to use the above tokenizers, simply provide them as a parameter to `add_text_field`. e.g.
+```python
+schema_builder.add_text_field("body",  stored=True,  tokenizer_name='en_stem')
+```
+
+## Adding one document.
+
+```python
+writer = index.writer()
+writer.add_document(tantivy.Document(
+	doc_id=1,
+    title=["The Old Man and the Sea"],
+    body=["""He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish."""],
+))
+# ... and committing
+writer.commit()
+```
+
+## Building and Executing Queries
+
+First you need to get a searcher for the index
+
+```python
+# Reload the index to ensure it points to the last commit.
+index.reload()
+searcher = index.searcher()
+```
+
+Then you need to get a valid query object by parsing your query on the index.
+
+```python
+query = index.parse_query("fish days", ["title", "body"])
+(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
+best_doc = searcher.doc(best_doc_address)
+assert best_doc["title"] == ["The Old Man and the Sea"]
+print(best_doc)
+```
+
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -0,0 +1,15 @@
+site_name: tantivy-py
+# site_url: https://example.com
+nav:
+  - Home: index.md
+  - Tutorials: tutorials.md
+  - How-to Guides: howto.md
+  - Explanation: explanation.md
+  - Reference: reference.md
+  - About: about.md
+theme: readthedocs
+
+# Can nest documents under above sections
+# - 'User Guide':
+#     - 'Writing your docs': 'writing-your-docs.md'
+#     - 'Styling your docs': 'styling-your-docs.md'
diff --git a/pyproject.toml b/pyproject.toml
@@ -6,5 +6,11 @@ build-backend = "maturin"
 name = "tantivy"
 requires-python = ">=3.7"
 
+[project.optional-dependencies]
+dev = [
+    "nox",
+    "mkdocs",
+]
+
 [tool.maturin]
 bindings = "pyo3"