The Semantic Data Library

And these visions of data types, they kept us up past the dawn.

The Semantic Data Library

Visions provides a set of tools for defining and using semantic data types.

Semantic type detection & inference on sequence data.
Automated data processing
Completely customizable. Visions makes it easy to build and modify semantic data types for domain specific purposes
Out of the box support for multiple backend implementations including pandas, spark, numpy, and python
A robust set of default types and typesets covering the most common use cases.

Check out the complete documentation here.

Installation

Source code is available on github and binary installers via pip.

# Pip
pip install visions

Complete installation instructions (including extras) are available in the docs.

Quick Start Guide

If you want to play immediately check out the examples folder on . Otherwise, let's get some data

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df.head(2)

PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38.0	1	0	PC 17599	71.2833	C85	C

The most important abstraction in visions are Types - these represent semantic notions about data. You have access to a range of well tested types like Integer, Float, and Files covering the most common software development use cases. Types can be bundled together into typesets. Behind the scenes, visions builds a traversable graph for any collection of types.

from visions import types, typesets

# StandardSet is the basic builtin typeset
typeset = typesets.CompleteSet()
typeset.plot_graph()

Note: Plots require pygraphviz to be installed.

Because of the special relationship between types these graphs can be used to detect the type of your data or infer a more appropriate one.

# Detection looks like this
typeset.detect_type(df)

# While inference looks like this
typeset.infer_type(df)

# Inference works well even if we monkey with the data, say by converting everything to strings
typeset.infer_type(df.astype(str))
>> {
    'PassengerId': Integer,
    'Survived': Integer,
    'Pclass': Integer,
    'Name': String,
    'Sex': String,
    'Age': Float,
    'SibSp': Integer,
    'Parch': Integer,
    'Ticket': String,
    'Fare': Float,
    'Cabin': String,
    'Embarked': String
}

Visions solves many of the most common problems working with tabular data for example, sequences of Integers are still recognized as integers whether they have trailing decimal 0's from being cast to float, missing values, or something else altogether. Much of this cleaning is performed automatically providing nicely cleaned and processed data as well.

cleaned_df = typeset.cast_to_inferred(df)

This is only a small taste of everything visions can do including building your own domain specific types and typesets so please check out the API documentation or the examples/ directory for more info!

Supported frameworks

Thanks to its dispatch based implementation Visions is able to exploit framework specific capabilities offered by libraries like pandas and spark. Currently it works with the following backends by default.

Pandas (feature complete)
Numpy (boolean, complex, date time, float, integer, string, time deltas, string, objects)
Spark (boolean, categorical, date, date time, float, integer, numeric, object, string)
Python (string, float, integer, date time, time delta, boolean, categorical, object, complex - other datatypes are untested)

If you're using pandas it will also take advantage of parallelization tools like swifter if available.

It also offers a simple annotation based API for registering new implementations as needed. For example, if you wished to extend the categorical data type to include a Dask specific implementation you might do something like

from visions.types.categorical import Categorical
from pandas.api import types as pdt
import dask


@Categorical.contains_op.register
def categorical_contains(series: dask.dataframe.Series, state: dict) -> bool:
    return pdt.is_categorical_dtype(series.dtype)

Contributing and support

Contributions to visions are welcome. For more information, please visit the community contributions page and join on us on slack. The github issues tracker is used for reporting bugs, feature requests and support questions.

Also, please check out some of the other companies and packages using visions including:

If you're currently using visions or would like to be featured here please let us know.

Acknowledgements

This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

Name		Name	Last commit message	Last commit date
Latest commit History 1,069 Commits
.github		.github
docs		docs
docsrc		docsrc
examples		examples
images		images
paper		paper
src/visions		src/visions
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README.rst		README.rst
make.bat		make.bat
mypy.ini		mypy.ini
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
requirements_spark.txt		requirements_spark.txt
requirements_test.txt		requirements_test.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Semantic Data Library

Installation

Quick Start Guide

Supported frameworks

Contributing and support

Acknowledgements

About

Releases 15

Packages

Used by 9.6k

Contributors 11

Languages

License

dylan-profiler/visions

Folders and files

Latest commit

History

Repository files navigation

The Semantic Data Library

Installation

Quick Start Guide

Supported frameworks

Contributing and support

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 15

Packages 0

Used by 9.6k

Contributors 11

Languages

Packages