Nourish is a Python API that enables data consumers and distributors to easily use and share datasets, and establishes a standard for exchanging data assets. It enables:
- a data scientist to have a simpler and more unified way to begin working with a wide range of datasets, and
- a data distributor to have a consistent, safe, and open source way to share datasets with interested communities.
Quick Example
>>> import nourish
>>> nourish.list_all_datasets()
{'claim_sentences_search': ('1.0.2',),
..., 'wikitext103': ('1.0.1',)}
>>> nourish.load_dataset('wikitext103')
{...} # Content of the dataset
To install the latest version of Nourish, run
$ pip install nourish
Alternatively, if you have downloaded the source, switch to the source directory (same directory as this README file,
cd /path/to/nourish-source
) and run
$ pip install -U .
Import the package and load a dataset. Nourish will download WikiText-103 dataset (version 1.0.1
) if it's not already
downloaded, and then load it.
import nourish
wikitext103_data = nourish.load_dataset('wikitext103')
View available Nourish datasets and their versions.
>>> nourish.list_all_datasets()
{'claim_sentences_search': ('1.0.2',), ..., 'wikitext103': ('1.0.1',)}
To view your globally set configs for Nourish, such as your default data directory, use nourish.get_config
.
>>> nourish.get_config()
Config(DATADIR=PosixPath('dir/to/dowload/load/from'), ..., DATASET_SCHEMATA_URL='file/to/load/datasets/from')
By default, nourish.load_dataset
downloads to and loads from
~/.nourish/data/<dataset-name>/<dataset-version>/
. To change the default data directory, use nourish.init
.
nourish.init(DATADIR='new/dir/to/dowload/load/from')
Load a previously downloaded dataset using nourish.load_dataset
. With the new default data dir set, Nourish now
searches for the Groningen Meaning Bank
dataset (version 1.0.2
) in new/dir/to/dowload/load/from/gmb/1.0.2/
.
gmb_data = load_dataset('gmb', version='1.0.2', download=False) # assuming GMB dataset was already downloaded
For a more extensive look at Nourish functionality, check out these notebooks: