SeaQuBe

Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym SeaQuBe or seaqube.

This python framework provides several text augmentation implementations and word embedding quality evaluation methods. It is designed to fit in your machine learning pipeline. The BaseAugmentation class provides the same api as the python package nlpaug, so that this packages can used together smoothly. However BaseAugmentation provides also other methods. Detailed examples see beneath.

SeaQuBe provides also a toolkit to wrap a trained nlp model to a nice interactive tool.

Features

Text Data Augmentation
Chaining and Reducing of Text Data Augmentations
Word Embedding Quality Methods
Interactive NLM Model Wrapper

Demo

Augmentation

Level	Augmenter	Description
Character	QwertyAugmentation	Simulate keyboard distance error
Corpus	UnigramAugmentation	Replace ubiquitous words with other ubiquitous words
Word	Active2PassiveAugmentation	Change surface of document using an simple active-to-passive transformer
Word	EDAAugmentation	Augment document using the EDA algorithm
Word	EmbeddingAugmentation	Replace similar word using WordNet
Word	TranslationAugmentation	Change surface of document using translation and back-translation (with GoogleTranslate)

Augmentation Chainer

The streaming feature of augmentation is implemented in the AugmentationStreamer class. One Reduceing class exist, more can implemented extending the BaseReduction class.

Action	Class	Description
Streaming	AugmentationStreamer	Run augmentation for each document through all chained augmentations.
Reducing	UniqueCorpusReduction	Getting a list of documents, only unique documents are returned.

Word Embedding Evaluation

Method	Description
WordAnalogyBenchmark	This method benchmark how go relations of the type: `a is to b as c is to d` can be solved correctly.
WordSimilarityBenchmark	This methods compares the similarity of a word pair, calculated by a model with a human estimated similarity score.
WordOutliersBenchmark	This method benchmark how good a outlier of a group of words can be detected.
SemanticWordnetBenchmark	Based on the WordNet graph, the goodnes of the semantic / similarity of a nlp model is benchmarked.

Installation

SeaQuBe can be installed from PyPip using: pip install seaqube or run in the main directory: python setup.py install.

External Dependencies

Some external dependencies are not installed automatically, but seaqube or nltk might throw errors with an instruction what to do. For example seqube might ask you to run:

python -c "from seaqube import download;download('vec4ir')"

Quick Demo

from seaqube.augmentation.word import Active2PassiveAugmentation, EDAAugmentation, TranslationAugmentation, EmbeddingAugmentation
translate = TranslationAugmentation(max_length=2)
translate.doc_augment(['This', 'is', 'a', 'tokenized', 'corpus'])

Setup Dev Environment

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
docs		docs
examples		examples
logo		logo
seaqube		seaqube
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeaQuBe

Features

Demo

Augmentation

Augmentation Chainer

Word Embedding Evaluation

Installation

External Dependencies

Quick Demo

Setup Dev Environment

About

Releases 1

Packages

Languages

License

bees4ever/seaqube

Folders and files

Latest commit

History

Repository files navigation

SeaQuBe

Features

Demo

Augmentation

Augmentation Chainer

Word Embedding Evaluation

Installation

External Dependencies

Quick Demo

Setup Dev Environment

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages