-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle empty aggregation in datashader operations #1281
Conversation
Ready to review. |
Reviewing now. Planning on adding some unit tests? |
I suppose, there are currently no datashader unit tests. |
elif isinstance(obj, Element): | ||
glyph = 'line' if isinstance(obj, Curve) else 'points' | ||
paths.append(PandasInterface.as_dframe(obj)) | ||
|
||
if dims is None or len(dims) != 2: | ||
return None, None, None, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purely syntactic preference, but I would wrap the return tuple in parentheses i.e (None, None, None, None)
for d in (x, y): | ||
if df[d].dtype.kind == 'M': | ||
param.warning('Casting %s dimension data to integer ' | ||
'datashader cannot process datetime data ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How interpretable would the rest be after such datetime to int casting? I suppose it might work out but maybe it doesn't really make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a datetime formatter to your axis and it will work. I think it's fine with the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@@ -176,6 +194,15 @@ def _process(self, element, key=None): | |||
category = agg_fn.column if isinstance(agg_fn, ds.count_cat) else None | |||
x, y, data, glyph = self.get_agg_data(element, category) | |||
|
|||
if x is None or y is None: | |||
x0, x1 = self.p.x_range or (-0.5, 0.5) | |||
y0, y1 = self.p.y_range or (-0.5, 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like unit range is the default. Guess that is fine as long as this is sensible default behavior when the data is missing.
@@ -307,7 +334,12 @@ def _process(self, element, key=None): | |||
|
|||
with warnings.catch_warnings(): | |||
warnings.filterwarnings('ignore', r'invalid value encountered in true_divide') | |||
img = tf.shade(array, **shade_opts) | |||
if np.isnan(array.data).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it might be a little inefficient to compute this predicate on large arrays but I think it is okay for now. No need to optimize anything just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worried about that too, but 100ms for a 10000x10000 array (which is considerably larger than we'll ever use), is okay.
xc = np.linspace(x0, x1, self.p.width) | ||
yc = np.linspace(y0, y1, self.p.height) | ||
xarray = xr.DataArray(np.full((self.p.height, self.p.width), np.NaN, dtype=np.float32), | ||
dims=['y', 'x'], coords={'x': xc, 'y': yc}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this all np.NaNs
? If so you could set a is_all_nans
switch and use it later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unless you want to add a is_all_nan switches to Image
Elements. It's two distinct operations.
Ok, maybe file an issue about that then (referencing this PR) and we can address it later. |
Made a few comments but otherwise I'm happy to merge. |
Let's just merge, I'll open an issue about unit tests for datashader operations. |
Looks good. Merging. |
Looks good, thanks. If you are working around behavior in datashader that you think should be fixed (e.g. if it should be raising a more sensible exception in some of these cases) then please file an issue there. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
As the title says, this handles empty aggregates gracefully in the datashader operations and ensures the both the interfaces and plotting backends display handle the resulting aggregates correctly.