-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HeatMap more general #849
Changes from 20 commits
bec8024
339f988
1d3d57e
69a9793
efd4bd9
17651f0
f3543e6
843387c
29f47c9
3f4b073
d68485f
143c301
0a91dce
03cebf6
844c1ad
fb4b207
fcac23e
d380d08
dcae11f
9082070
f5998f2
050c4c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,24 @@ | ||
import itertools | ||
|
||
import param | ||
import numpy as np | ||
|
||
from ..core import Dataset, OrderedDict | ||
from ..core.operation import ElementOperation | ||
from ..core.util import (pd, is_nan, sort_topologically, | ||
cartesian_product, is_cyclic, one_to_one) | ||
|
||
try: | ||
import dask | ||
except: | ||
dask = None | ||
|
||
try: | ||
import xarray as xr | ||
except: | ||
xr = None | ||
|
||
|
||
def toarray(v, index_value=False): | ||
""" | ||
Interface helper function to turn dask Arrays into numpy arrays as | ||
|
@@ -30,3 +44,98 @@ def compute_edges(edges): | |
raise ValueError('Centered bins have to be of equal width.') | ||
edges -= width/2. | ||
return np.concatenate([edges, [edges[-1]+width]]) | ||
|
||
|
||
def reduce_fn(x): | ||
""" | ||
Aggregation function to get the first non-zero value. | ||
""" | ||
values = x.values if pd and isinstance(x, pd.Series) else x | ||
for v in values: | ||
if not is_nan(v): | ||
return v | ||
return np.NaN | ||
|
||
|
||
class categorical_aggregate2d(ElementOperation): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks great! I was just wondering if you want to keep this class in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's imported there but can't be moved, cyclical imports again. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, having it available for |
||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps this would be better expressed as an operation? Then maybe it could have a minimal docstring example in the class docstring? |
||
Generates a gridded Dataset of 2D aggregate arrays indexed by the | ||
first two dimensions of the passed Element, turning all remaining | ||
dimensions into value dimensions. The key dimensions of the | ||
gridded array are treated as categorical indices. Useful for data | ||
indexed by two independent categorical variables such as a table | ||
of population values indexed by country and year. Data that is | ||
indexed by continuous dimensions should be binned before | ||
aggregation. The aggregation will retain the global sorting order | ||
of both dimensions. | ||
|
||
>> table = Table([('USA', 2000, 282.2), ('UK', 2005, 58.89)], | ||
kdims=['Country', 'Year'], vdims=['Population']) | ||
>> categorical_aggregate2d(table) | ||
Dataset({'Country': ['USA', 'UK'], 'Year': [2000, 2005], | ||
'Population': [[ 282.2 , np.NaN], [np.NaN, 58.89]]}, | ||
kdims=['Country', 'Year'], vdims=['Population']) | ||
""" | ||
|
||
datatype = param.List(['xarray', 'grid'] if xr else ['grid'], doc=""" | ||
The grid interface types to use when constructing the gridded Dataset.""") | ||
|
||
def _process(self, obj, key=None): | ||
""" | ||
Generates a categorical 2D aggregate by inserting NaNs at all | ||
cross-product locations that do not already have a value assigned. | ||
Returns a 2D gridded Dataset object. | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Quite a long method...if you see chunks that could be split up into helper methods, that might be sensible. Up to you though! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to split it up. |
||
if isinstance(obj, Dataset) and obj.interface.gridded: | ||
return obj | ||
elif obj.ndims > 2: | ||
raise ValueError("Cannot aggregate more than two dimensions") | ||
elif len(obj.dimensions()) < 3: | ||
raise ValueError("Must have at two dimensions to aggregate over" | ||
"and one value dimension to aggregate on.") | ||
|
||
dim_labels = obj.dimensions(label=True) | ||
dims = obj.dimensions() | ||
kdims, vdims = dims[:2], dims[2:] | ||
xdim, ydim = dim_labels[:2] | ||
nvdims = len(dims) - 2 | ||
d1keys = obj.dimension_values(xdim, False) | ||
d2keys = obj.dimension_values(ydim, False) | ||
shape = (len(d2keys), len(d1keys)) | ||
nsamples = np.product(shape) | ||
|
||
# Determine global orderings of y-values using topological sort | ||
grouped = obj.groupby(xdim, container_type=OrderedDict, | ||
group_type=Dataset).values() | ||
orderings = OrderedDict() | ||
for group in grouped: | ||
vals = group.dimension_values(ydim) | ||
if len(vals) == 1: | ||
orderings[vals[0]] = [vals[0]] | ||
else: | ||
for i in range(len(vals)-1): | ||
p1, p2 = vals[i:i+2] | ||
orderings[p1] = [p2] | ||
if one_to_one(orderings, d2keys): | ||
d2keys = np.sort(d2keys) | ||
elif not is_cyclic(orderings): | ||
d2keys = list(itertools.chain(*sort_topologically(orderings))) | ||
|
||
# Pad data with NaNs | ||
ys, xs = cartesian_product([d2keys, d1keys]) | ||
data = {xdim: xs.flatten(), ydim: ys.flatten()} | ||
for vdim in vdims: | ||
values = np.empty(nsamples) | ||
values[:] = np.NaN | ||
data[vdim.name] = values | ||
dtype = 'dataframe' if pd else 'dictionary' | ||
dense_data = Dataset(data, kdims=obj.kdims, vdims=obj.vdims, datatype=[dtype]) | ||
concat_data = obj.interface.concatenate([dense_data, Dataset(obj)], datatype=dtype) | ||
agg = concat_data.reindex([xdim, ydim]).aggregate([xdim, ydim], reduce_fn) | ||
|
||
# Convert data to a gridded dataset | ||
grid_data = {xdim: d1keys, ydim: d2keys} | ||
for vdim in vdims: | ||
grid_data[vdim.name] = agg.dimension_values(vdim).reshape(shape) | ||
return agg.clone(grid_data, datatype=self.p.datatype) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the representation of the graph? A list of edges as tuples? Would be good to mention in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing the representation is similar as in
one_to_one
...even so, probably worth mentioning..There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, all three methods here (
sort_topologically
,cyclical
andone_to_one
) use the same representation, which is mapping between nodes and edges, will add the docstring.