Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an IbisInterface #4517

Merged
merged 68 commits into from
Nov 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
68d1fb5
Cleaned up core/data/ module
philippjfr Jul 9, 2020
9bb5b5f
Add IbisInterface skeleton
philippjfr Jul 9, 2020
6165f9e
Add more methods the ibis interface. values, shape, array, dtype, sor…
Jul 26, 2020
5ea39e9
add groupby method
tonyfast Jul 30, 2020
6ae1c42
Add iloc and groupby method
tonyfast Jul 31, 2020
55d7949
add add dimension
tonyfast Jul 31, 2020
8261449
add selection methods
tonyfast Aug 4, 2020
db41679
add sample
tonyfast Aug 4, 2020
4791ca0
drop old doc string
tonyfast Aug 4, 2020
d59fd7b
add an aggregating method
tonyfast Aug 13, 2020
ccc2b1a
Work on getting tests passing
philippjfr Aug 13, 2020
13ccf98
Further test fixes
philippjfr Aug 14, 2020
5204c49
Simplified iloc
philippjfr Aug 14, 2020
36a3b5a
update iloc
tonyfast Aug 18, 2020
9f2f9f4
iloc work
tonyfast Aug 18, 2020
74246a3
flakes
tonyfast Aug 19, 2020
f080a8e
flake
tonyfast Aug 19, 2020
30e54b9
fix omission
tonyfast Aug 19, 2020
7efc090
test
tonyfast Aug 19, 2020
99ec2a8
add ibis to examples requirements
tonyfast Aug 19, 2020
6f6a4b6
try different dir for temporary file
tonyfast Aug 19, 2020
df56135
vscode does some questionable copy paste thing ssometimes
tonyfast Aug 19, 2020
62e8b91
install py37 in env
tonyfast Aug 19, 2020
a865733
pyenv
tonyfast Aug 19, 2020
12907a3
try using a contextmanager https://github.com/appveyor/ci/issues/2547
tonyfast Aug 21, 2020
347472b
h8 flake
tonyfast Aug 21, 2020
2f0ed92
Fix some iloc tests
tonyfast Aug 25, 2020
b08a904
resolve an error in redim
tonyfast Aug 26, 2020
80f8702
if error in iloc
tonyfast Aug 26, 2020
729920a
fixed more iloc issues
tonyfast Aug 26, 2020
f23b111
move indexing to function
tonyfast Aug 27, 2020
008e3e5
allow masking even though it is a weird case
tonyfast Aug 27, 2020
37a5e70
tune iloc
tonyfast Aug 27, 2020
4f72510
update travis
tonyfast Aug 31, 2020
571c56d
print debugging like a pro
tonyfast Aug 31, 2020
40f9d27
remove flake problem
tonyfast Aug 31, 2020
6743e0c
rm type annotations
tonyfast Aug 31, 2020
5eb2e0e
pin version on ibis, we are likely installing an old version
tonyfast Sep 1, 2020
4de48a5
add conda forge to travis channels like appveyor does
tonyfast Sep 1, 2020
cbd4db9
channels
tonyfast Sep 1, 2020
d2daf41
skips iloc problems
tonyfast Sep 2, 2020
f8da9f9
binder env
tonyfast Sep 2, 2020
bf8cde5
increase timeout
tonyfast Sep 2, 2020
67153f1
load ibis as extras
tonyfast Sep 8, 2020
e1476f0
rm annotation
tonyfast Sep 8, 2020
788fe22
Cleanup and review
philippjfr Sep 21, 2020
2660814
Implement optimized IbisInterface.dframe
philippjfr Sep 24, 2020
0367c2e
Defer HeatMap aggregation
philippjfr Sep 24, 2020
d474283
Skip range calculation for unsupported types
philippjfr Oct 4, 2020
b7ceccf
Support more aggregations
philippjfr Oct 4, 2020
7747842
Implement caching of lazy Interfaces
philippjfr Oct 4, 2020
e2170ba
Use nonzero implementation in Dataset.aggregate instead of length
philippjfr Oct 4, 2020
4ce7a84
Implement Dataset persist and compute methods
philippjfr Oct 5, 2020
0ab36ff
Fixed flake
philippjfr Oct 5, 2020
9cb5bc5
Implement histogram operation for Ibis
philippjfr Oct 5, 2020
391aedd
Fix Dataset.compute
philippjfr Oct 5, 2020
3cf3501
Fix IbisInterface.histogram
philippjfr Oct 5, 2020
4d98711
Minor histogram fix
philippjfr Oct 7, 2020
d367d03
Small selection fix
philippjfr Oct 7, 2020
f742f67
Fixed flakes
philippjfr Nov 20, 2020
95c3911
Replace typing
philippjfr Nov 20, 2020
6fde742
Fixed histogram operation for dask
philippjfr Nov 20, 2020
92503b3
Fixed Iterable import in py2
philippjfr Nov 20, 2020
2a2fc47
Fixed spreadfn insertion order
philippjfr Nov 20, 2020
cd89256
Require recent bokeh version
philippjfr Nov 20, 2020
f663fde
Drop bokeh channel
philippjfr Nov 20, 2020
c2f0ebe
Fix bokeh for py2
philippjfr Nov 20, 2020
abde038
Skip macos
philippjfr Nov 20, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ jobs:
exclude:
- os: windows-latest
python-version: 2.7
- os: macos-latest
python-version: 3.7
timeout-minutes: 30
defaults:
run:
Expand All @@ -28,7 +30,7 @@ jobs:
DESC: "Python ${{ matrix.python-version }} tests"
HV_REQUIREMENTS: "unit_tests"
PYTHON_VERSION: ${{ matrix.python-version }}
CHANS_DEV: "-c pyviz/label/dev -c bokeh"
CHANS_DEV: "-c pyviz/label/dev"
CHANS: "-c pyviz"
MPLBACKEND: "Agg"
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -61,6 +63,12 @@ jobs:
git describe
echo "======"
conda list
- name: bokeh update
if: startsWith(matrix.python-version, 3.)
run: |
eval "$(conda shell.bash hook)"
conda activate test-environment
conda install "bokeh>=2.2"
- name: matplotlib patch
if: startsWith(matrix.python-version, 3.)
run: |
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,6 @@ holoviews.rc
ghostdriver.log
holoviews/.version
.dir-locals.el
.doit.db
.vscode/settings.json
holoviews/.vscode/settings.json
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ notifications:
env:
global:
- PKG_TEST_PYTHON="--test-python=py37 --test-python=py27"
- CHANS_DEV="-c pyviz/label/dev -c bokeh"
- CHANS_DEV="-c pyviz/label/dev -c bokeh -c conda-forge"
- CHANS="-c pyviz"
- MPLBACKEND="Agg"
- PYTHON_VERSION=3.7
Expand Down Expand Up @@ -69,7 +69,7 @@ jobs:
install:
- doit env_create $CHANS_DEV --python=$PYTHON_VERSION
- source activate test-environment
- travis_wait 30 doit develop_install $CHANS_DEV -o $HV_REQUIREMENTS
- travis_wait 45 doit develop_install $CHANS_DEV -o $HV_REQUIREMENTS
- if [ "$PYTHON_VERSION" == "3.6" ]; then conda uninstall matplotlib matplotlib-base --force; conda install $CHANS_DEV matplotlib=3.0.3 --no-deps; fi;
- doit env_capture
- hash -r
Expand Down
1 change: 1 addition & 0 deletions binder/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ dependencies:
- bzip2
- dask
- scipy
- ibis-framework >= 1.3
122 changes: 57 additions & 65 deletions holoviews/core/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

import numpy as np
import param
import pandas as pd # noqa

from param.parameterized import add_metaclass, ParameterizedMetaclass

Expand All @@ -21,57 +22,24 @@
from ..element import Element
from ..ndmapping import OrderedDict, MultiDimensionalMapping
from ..spaces import HoloMap, DynamicMap
from .interface import Interface, iloc, ndloc
from .array import ArrayInterface
from .dictionary import DictInterface
from .grid import GridInterface

from .array import ArrayInterface # noqa (API import)
from .cudf import cuDFInterface # noqa (API import)
from .dask import DaskInterface # noqa (API import)
from .dictionary import DictInterface # noqa (API import)
from .grid import GridInterface # noqa (API import)
from .ibis import IbisInterface # noqa (API import)
from .interface import Interface, iloc, ndloc # noqa (API import)
from .multipath import MultiInterface # noqa (API import)
from .image import ImageInterface # noqa (API import)
from .pandas import PandasInterface # noqa (API import)
from .spatialpandas import SpatialPandasInterface # noqa (API import)
from .xarray import XArrayInterface # noqa (API import)

default_datatype = 'dictionary'
datatypes = ['dictionary', 'grid']

try:
import pandas as pd # noqa (Availability import)
from .pandas import PandasInterface
default_datatype = 'dataframe'
datatypes.insert(0, 'dataframe')
DFColumns = PandasInterface
except ImportError:
pd = None
except Exception as e:
pd = None
param.main.param.warning('Pandas interface failed to import with '
'following error: %s' % e)

try:
from .spatialpandas import SpatialPandasInterface # noqa (API import)
datatypes.append('spatialpandas')
except ImportError:
pass

try:
from .xarray import XArrayInterface # noqa (Conditional API import)
datatypes.append('xarray')
except ImportError:
pass
default_datatype = 'dataframe'

try:
from .cudf import cuDFInterface # noqa (Conditional API import)
datatypes.append('cuDF')
except ImportError:
pass

try:
from .dask import DaskInterface # noqa (Conditional API import)
datatypes.append('dask')
except ImportError:
pass

if 'array' not in datatypes:
datatypes.append('array')
if 'multitabular' not in datatypes:
datatypes.append('multitabular')
datatypes = ['dataframe', 'dictionary', 'grid', 'xarray', 'dask',
'cuDF', 'spatialpandas', 'array', 'multitabular', 'ibis']


def concat(datasets, datatype=None):
Expand Down Expand Up @@ -370,6 +338,10 @@ def __init__(self, data, kdims=None, vdims=None, **kwargs):
)
self._transforms = input_transforms or []

# On lazy interfaces this allows keeping an evaluated version
# of the dataset in memory
self._cached = None

# Handle initializing the dataset property.
self._dataset = input_dataset
if self._dataset is None and isinstance(input_data, Dataset) and not dataset_provided:
Expand Down Expand Up @@ -403,7 +375,6 @@ def dataset(self):
return Dataset(self, _validate_vdims=False, **self._dataset)
return self._dataset


@property
def pipeline(self):
"""
Expand All @@ -413,6 +384,34 @@ def pipeline(self):
"""
return self._pipeline

def compute(self):
"""
Computes the data to a data format that stores the daata in
memory, e.g. a Dask dataframe or array is converted to a
Pandas DataFrame or NumPy array.

Returns:
Dataset with the data stored in in-memory format
"""
return self.interface.compute(self)

def persist(self):
"""
Persists the results of a lazy data interface to memory to
speed up data manipulation and visualization. If the
particular data backend already holds the data in memory
this is a no-op. Unlike the compute method this maintains
the same data type.

Returns:
Dataset with the data persisted to memory
"""
persisted = self.interface.persist(self)
if persisted.interface is self.interface:
return persisted
self._cached = persisted
return self

def closest(self, coords=[], **kwargs):
"""Snaps coordinate(s) to closest coordinate in Dataset

Expand Down Expand Up @@ -441,7 +440,7 @@ def closest(self, coords=[], **kwargs):
if xs.dtype.kind in 'SO':
raise NotImplementedError("Closest only supported for numeric types")
idxs = [np.argmin(np.abs(xs-coord)) for coord in coords]
return [xs[idx] for idx in idxs]
return [type(s)(xs[idx]) for s, idx in zip(coords, idxs)]


def sort(self, by=None, reverse=False):
Expand Down Expand Up @@ -594,15 +593,13 @@ def select(self, selection_expr=None, selection_specs=None, **selection):
# Handle selection dim expression
if selection_expr is not None:
mask = selection_expr.apply(self, compute=False, keep_index=True)
dataset = self[mask]
else:
dataset = self
selection = {'selection_mask': mask}

# Handle selection kwargs
if selection:
data = dataset.interface.select(dataset, **selection)
data = self.interface.select(self, **selection)
else:
data = dataset.data
data = self.data

if np.isscalar(data):
return data
Expand Down Expand Up @@ -678,7 +675,7 @@ def __getitem__(self, slices):
if not len(slices) == len(self):
raise IndexError("Boolean index must match length of sliced object")
return self.clone(self.select(selection_mask=slices))
elif slices in [(), Ellipsis]:
elif (isinstance(slices, ()) and len(slices) == 1) or slices is Ellipsis:
return self
if not isinstance(slices, tuple): slices = (slices,)
value_select = None
Expand Down Expand Up @@ -770,7 +767,7 @@ def sample(self, samples=[], bounds=None, closest=True, **kwargs):
# may be replaced with more general handling
# see https://github.com/ioam/holoviews/issues/1173
from ...element import Table, Curve
datatype = ['dataframe', 'dictionary', 'dask']
datatype = ['dataframe', 'dictionary', 'dask', 'ibis']
if len(samples) == 1:
sel = {kd.name: s for kd, s in zip(self.kdims, samples[0])}
dims = [kd for kd, v in sel.items() if not np.isscalar(v)]
Expand Down Expand Up @@ -879,7 +876,7 @@ def aggregate(self, dimensions=None, function=None, spreadfn=None, **kwargs):

# Handle functions
kdims = [self.get_dimension(d, strict=True) for d in dimensions]
if not len(self):
if not self:
if spreadfn:
spread_name = spreadfn.__name__
vdims = [d for vd in self.vdims for d in [vd, vd.clone('_'.join([vd.name, spread_name]))]]
Expand All @@ -905,7 +902,9 @@ def aggregate(self, dimensions=None, function=None, spreadfn=None, **kwargs):
for i, d in enumerate(vdims):
dim = d.clone('_'.join([d.name, spread_name]))
dvals = error.dimension_values(d, flat=False)
combined = combined.add_dimension(dim, ndims+i, dvals, True)
idx = vdims.index(d)
combined = combined.add_dimension(dim, idx+1, dvals, True)
vdims = combined.vdims
return combined.clone(new_type=Dataset if generic_type else type(self))

if np.isscalar(aggregated):
Expand Down Expand Up @@ -1241,10 +1240,3 @@ def ndloc(self):
dataset.ndloc[[1, 2, 3], [0, 2, 3]]
"""
return ndloc(self)


# Aliases for pickle backward compatibility
Columns = Dataset
ArrayColumns = ArrayInterface
DictColumns = DictInterface
GridColumns = GridInterface
16 changes: 13 additions & 3 deletions holoviews/core/data/dask.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,14 @@ def init(cls, eltype, data, kdims, vdims):
data = reset
return data, dims, extra

@classmethod
def compute(cls, dataset):
return dataset.clone(dataset.data.compute())

@classmethod
def persiste(cls, dataset):
return dataset.clone(dataset.data.persist())

@classmethod
def shape(cls, dataset):
return (len(dataset.data), len(dataset.data.columns))
Expand Down Expand Up @@ -263,9 +271,11 @@ def add_dimension(cls, dataset, dimension, dim_pos, values, vdim):
data = dataset.data
if dimension.name not in data.columns:
if not np.isscalar(values):
err = ('Dask dataframe does not support assigning '
'non-scalar value.')
raise NotImplementedError(err)
if len(values):
err = ('Dask dataframe does not support assigning '
'non-scalar value.')
raise NotImplementedError(err)
values = None
data = data.assign(**{dimension.name: values})
return data

Expand Down
11 changes: 11 additions & 0 deletions holoviews/core/data/grid.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,17 @@ def ndloc(cls, dataset, indices):
selected[d.name] = arr[tuple(adjusted_inds)]
return tuple(selected[d.name] for d in dataset.dimensions())

@classmethod
def persist(cls, dataset):
da = dask_array_module()
return {k: v.persist() if da and isinstance(v, da.Array) else v
for k, v in dataset.data.items()}

@classmethod
def compute(cls, dataset):
da = dask_array_module()
return {k: v.compute() if da and isinstance(v, da.Array) else v
for k, v in dataset.data.items()}

@classmethod
def values(cls, dataset, dim, expanded=True, flat=True, compute=True,
Expand Down
Loading