🚀 osman

STRV repository with useful functions to ease (y)our work with OpenSearch.

osman stands for OpenSearch MANager.

🔍 Content

Installation
Usage
Contribution
OpenSearch
Specs

💻 Installation

Create a new virtual environment.
Run pip install osmanager.

Executes a certain painless script with provided data and parameters. It then checks if the expected result is returned. The context_type must be provided and is either score or filter. This refers to the score or filter queries as is described here.

context_type = "score"

documents = {"id": 1, "container": [1, 2, 3]}

expected_result = 0

os_man.debug_painless_script(
  source=<source>,
  index=<index_name>,
  params=<params>,
  context_type=context_type,
  documents=documents,
  expected_result=expected_result
)

Debug a search template

Executes a certain search template against an index with defined parameters. It then checks if the expected indices are returned.

expected_ids = ["123", "10"]

os_man.debug_search_template(
  source=source, name=<template_name>, index=<index_name>, params=params, expected_ids=expected_ids
)

Reindex

Reindex an existing index with a new mapping and/or settings. In order to reindex, this function adds a suffix [1, 2] to the index name. Afterwards, the index should be referenced by its alias rather than its name.

For example:

An index with the name test-index is reindexed. Its name becomes test-index-1. When reindexed again, its name will become test-index-2. Hence, it should be referenced by its unchanging alias test-index.

os_man.reindex(
  name=<index_name>, mapping=<new_mapping>, settings=<new_settings>
)

👷‍♂️ Contribution

🔧 Local environment setup

Launch the docker daemon in advance of the following steps. You can develop the code using virtualenv but the local OpenSearch instance requires docker.

Local OpenSearch instance and docker development environment

Command make help shows a brief description of possible targets. A typical workflow scenario is:

Run make docker-run-opensearch to launch OpenSearch instance on port 9200 and OpenSearch Dashboards on port 5601, as described in docker-compose-opensearch.yml.
Run make dev-env to get a bash shell in the development environment defined in Dockerfile. Under Linux set the user/group in advance in .env, see bellow.
Develop.
Run make docker-clean-all to clean unused containers and volumes.

Note the following:

Under Linux you may encounter a problem that a user in the docker guest container doesn't have permissions to write to the local directory. This leads to problems with pytest unable to create cache directories. The solution is to have the followig variables in .env file:
```
  DEV_USER_ID=1000
  DEV_GROUP_ID=1000
```
Substitute 1000 by uid and gid (introduction in this article) of the user on a host machine. The ids can be obtained by running id command. See explanation for why we do that. With Docker under MacOS or Windows you won't need this.
You can browse the Dashboards from the web browser. In case of running Docker on localhost you can browse the Dashboards on http://localhost:5061.
All indexed data and dashboards are persistent in a docker volume, i.e. when you stop the OpenSearch containers the data are not lost.

In order to run notebook inside the docker container, use the following command (and ensure notebook is in your dependencies): jupyter notebook --ip 0.0.0.0 --no-browser --allow-root

virtualenv

Create virtual environment called venv: virtualenv --python=python3.8 venv
Activate it: . ./venv/bin/activate
Install python package: pip install -e .

In order to deactivate the environment, run deactivate command.

You can also delete the environment as following: rm -r ./venv/

🚥 Testing

Run pytest in your devel environment to run all tests.

The OsmanConfig class can be initialized from the environment by the following variables:

Variable	Default value	Type	Description
AUTH_METHOD	`http`	string	`http` for username/password authentication, `awsauth`for authentication with AWS user credentials
OPENSEARCH_HOST	None	string	address of OpenSearch host
OPENSEARCH_PORT	443	int	port number
OPENSEARCH_SSL_ENABLED	True	bool	use SSL?
OPENSEARCH_USER	None	string	username, for `http` AUTH_METHOD
OPENSEARCH_SECRET	None	string	password, for `http` AUTH_METHOD
AWS_ACCESS_KEY_ID	None	string	access key id for `awsauth` AUTH_METHOD, see AWS4Auth
AWS_SECRET_ACCESS_KEY	None	string	secret key for `awsauth` AUTH_METHOD
AWS_REGION	`us-east-1`	string	AWS region for `awsauth` AUTH_METHOD
AWS_SERVICE	`es`	string	AWS service for `awsauth` AUTH_METHOD

You can add these variables to your .env file, make dev-env will pass them to the devel Docker image. There is a test in test_osman.py creating Osman instance using environment variables so you can use any OpenSearch instance for testing.

🧹 Local lintering

For running linters from GitHub actions locally, you need to do the following.

Install pre-commit library.
From root project directory, run: pre-commit run --all-files

➕ Versioning

For information on semantic versioning, see semver.org.

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backwards compatible manner
PATCH version when you make backwards compatible bug fixes

When incrementing the MAJOR version, reset the MINOR and PATCH versions to 0. When incrementing the MINOR version, reset the PATCH version to 0.

When a version is released, a tag should be created in the format vMAJOR.MINOR.PATCH.

A new version is not automatically released upon merging into master. Follow the steps below to create a new release:

Add the tag to the current branch like this:

git tag -a v1.0.0 -m "Release version 1.0.0"

Push the tag to the remote repository:

git push origin --tags

📝 Contributors

Jaroslav Bezdek
Niek Mereu
Vladimír Kadlec

📜 OpenSearch

This section contains a brief overview of OpenSearch. This section is by no means complete or exhaustive. However, it contains a few tips and tricks that might be useful for the development of your OpenSearch project. For more information, visit the OpenSearch documentation.

📋 Analyzers

Analyzers are used to process text fields during indexing and search time. Analyzers are composed of one or more tokenizer and zero or more token filters. The tokenizer breaks the text into individual terms and the token filters potentially remove them from the analyzed field. Fields can have more than one field type, so different analyzers can be utilised for different situations.

This section contains a small overview of the analyzers available in OpenSearch. For analyzers in general, see Analyzers in OpenSearch.

Standard
- Standard analyzers are used to index and search for complete words. Standard analyzers are composed of a tokenizer and a token filter. The standard analyzer uses the standard tokenizer. The standard tokenizer breaks text into terms on word boundaries. The standard analyzer also uses the lowercase token filter. The lowercase token filter converts all tokens to lowercase.
- The standard tokenizer breaks down words on word boundaries. It breaks words down on whitespaces, as well as many special characters.
- The Standard analyzer might not be useful for fields like Email addresses. An email address like "123@strv-2.com" is broken down into "123", "strv", "2", "com".
Whitespace
- Whitespace analyzers are used to index and search for complete words. Whitespace analyzers are composed of a tokenizer and a token filter. The whitespace analyzer uses the whitespace tokenizer.
- Because the Whitespace analyzer does not break words on special characters, it is more suitable for data like Email addresses or URLs.
N-gram
- N-gram analyzers are used to index and search for partial words. N-gram analyzers are composed of a tokenizer and a token filter. The tokenizer breaks the text into individual terms and the token filter creates n-grams for each term. N-gram analyzers are used to index and search for partial words.
- N-gram analyzers should not be used for large text fields. This is because n-grams can be very large and can consume a lot of disk space.
- Words lose their meaning when they are broken into n-grams. For example, the word "search" is broken into "se", "ea", "ar", "rc", "ch". This means that a search for "sea" will match the word "search".
- N-gram analyzers could be useful when adressing a "username" field. This is because usernames are of limited length and users should probably be found by just searching for the middle part of their name. For example "xSuperUser" should be found by using "sup".

Not analysed	Standard	Whitespace	N-gram
123@strv-2.com	[123, strv, 2, com]	[123@strv-2.com]	[1, 12, 2, 23, 3...]
How's it going?	[How, s, it, going]	[How's, it, going?]	[H, Ho, o, ow, w...]
xUser-123	[xUser, 123]	[xUser-123]	[x, xu, u, us, s...]

📇 Field Types

This section contains a small overview of the field types that are available in OpenSearch. For field types that are not discussed here, please refer to the official documentation.

Keyword
- Keyword fields are used to store data that is not meant to be analyzed. Keyword fields are not analyzed and support exact value searches.
- Keyword fields are not used for full-text search.
- Keyword fields are for example useful for "id" fields.
Text
- Text fields are used to store textual data, such as the body of an email or the description of a product in an e-commerce store. Text fields are analyzed and support full-text search.
- Text fields are used when exact value searches are not wanted.

📚 Specs

Requirements

Library

The goal is to extend opensearch-py library features that we found useful.

Connect to opensearch using url (user/pass) or service account
Upload & remove search template
Upload & remove function
Create & drop index with mapping
Load json file with template source element
Load json file with function
Load json file with mapping
Run local search template with sample parameters to see if all work (without upload)
Run local function with sample parameters (without upload)
Run local index with sample doc with given data types (without upload)
Check local file vs opensearch file and show differences
- search template
- function
- index mapping

Application

The goal is to allow teams manage opensearch instances easily with templated project setup.

Have yaml config, yaml list of files
Create sample json structure for all files
Sync things based on yaml file:
- sync between local json files and yaml list
- sync between local yaml list and opensearch
Split everything between envs: have yaml file with env definitions
Nice to have lint template?
- vscode default json linter fails to lint template due to parameters syntax: {{#bla}}

Tasks

Tasks are kept in github project Osman

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🚀 osman

🔍 Content

💻 Installation

🔨 Usage

👷‍♂️ Contribution

🔧 Local environment setup

Local OpenSearch instance and docker development environment

virtualenv

🚥 Testing

🧹 Local lintering

➕ Versioning

📝 Contributors

📜 OpenSearch

📋 Analyzers

📇 Field Types

📚 Specs

Requirements

Library

Application

Tasks

Not analysed	Standard	Whitespace	N-gram
[email protected]	[123, strv, 2, com]	[[email protected]]	[1, 12, 2, 23, 3...]
How's it going?	[How, s, it, going]	[How's, it, going?]	[H, Ho, o, ow, w...]
xUser-123	[xUser, 123]	[xUser-123]	[x, xu, u, us, s...]

Files

README.md

Latest commit

History

README.md

File metadata and controls

🚀 osman

🔍 Content

Local OpenSearch instance and docker development environment

virtualenv

Requirements

Library

Application

Tasks