Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first_n, last_n, max_n and min_n reductions #1184

Merged
merged 8 commits into from
Mar 6, 2023
Merged

first_n, last_n, max_n and min_n reductions #1184

merged 8 commits into from
Mar 6, 2023

Conversation

ianthomas23
Copy link
Member

This is further work on improved inspection reductions (issue #1126) to add first_n, last_n, max_n and min_n reductions. Each accepts a column name and value for n, the number of results to return for each pixel. For example, max("value", n=3) will return a DataArray of shape (ny, nx, n) containing the 3 highest values of column "value" for each pixel.

Demo code using these within a where reduction to return corresponding row indexes or values from a different column:

import datashader as ds
import pandas as pd

df = pd.DataFrame(dict(
    x     = [ 0,  0,  1,  1,  0,  0,  2,  2],
    y     = [ 0,  0,  0,  0,  1,  1,  1,  1],
    value = [ 9,  8,  7,  6,  2,  3,  4,  5],
    other = [11, 12, 13, 14, 15, 16, 17, 18],
    #index    0   1   2   3   4   5   6   7
))

canvas = ds.Canvas(plot_height=2, plot_width=3)

reductions = [
    ("where first_n index", ds.where(ds.first_n("value", 3))),
    ("where first_n other", ds.where(ds.first_n("value", 3), "other")),
    ("where max_n index", ds.where(ds.max_n("value", 3))),
    ("where max_n other", ds.where(ds.max_n("value", 3), "other")),
]

for name, reduction in reductions:
    agg = canvas.points(df, 'x', 'y', agg=reduction)
    print(name, agg.data.dtype)
    print(agg.data)

which outputs

where first_n index int64
[[[ 0  1 -1]
  [ 2  3 -1]
  [-1 -1 -1]]

 [[ 4  5 -1]
  [-1 -1 -1]
  [ 6  7 -1]]]
where first_n other float64
[[[11. 12. nan]
  [13. 14. nan]
  [nan nan nan]]

 [[15. 16. nan]
  [nan nan nan]
  [17. 18. nan]]]
where max_n index int64
[[[ 0  1 -1]
  [ 2  3 -1]
  [-1 -1 -1]]

 [[ 5  4 -1]
  [-1 -1 -1]
  [ 7  6 -1]]]
where max_n other float64
[[[11. 12. nan]
  [13. 14. nan]
  [nan nan nan]]

 [[16. 15. nan]
  [nan nan nan]
  [18. 17. nan]]]

where, as usual, -1 means no row index and nan means no data to return.

This allows us to do some complicated combinations such as

ds.summary(
    count=ds.count(),
    min_n=ds.where(ds.min_n("value", n=3)),
    max_n=ds.where(ds.max_n("value", n=3)),
)

to return count plus min_n and max_n (or first_n and last_n) in a single datashader pass.

max_n and min_n work with dask but not CUDA (issue #1177 needs to be solved for that). first_n and last_n only work on the CPU and without dask, the same as first and last (#1177 and #1182 are needed to fix that).

Using antialiased lines the results looked OK in some situation and not others, so I am raising a NotImplemented error for all of these when using with antialiasing and I will separately consider what is reasonable behaviour here. This includes where(first_n) and so on as well.

There is one issue here that needs deciding. I've called the third dimension of the DataArray returned by such a reduction "n" to fit in with the names first_n, etc. You can put multiple whatever_n reductions in a single summary reduction as shown above. If they have the same n then it all works out as expected. But we need a policy on labelling the third dimension if the whatever_n have different n values. We could keep the first n as n, and if subsequent n values are different call them n1, n2, etc?

@codecov
Copy link

codecov bot commented Feb 16, 2023

Codecov Report

Merging #1184 (b6ed7a5) into main (229cea3) will increase coverage by 0.09%.
The diff coverage is 89.83%.

@@            Coverage Diff             @@
##             main    #1184      +/-   ##
==========================================
+ Coverage   85.39%   85.48%   +0.09%     
==========================================
  Files          35       35              
  Lines        8023     8232     +209     
==========================================
+ Hits         6851     7037     +186     
- Misses       1172     1195      +23     
Impacted Files Coverage Δ
datashader/glyphs/line.py 92.84% <ø> (-0.12%) ⬇️
datashader/reductions.py 86.17% <86.97%> (+0.17%) ⬆️
datashader/compiler.py 95.74% <100.00%> (+0.12%) ⬆️
datashader/core.py 88.38% <100.00%> (+0.02%) ⬆️
datashader/utils.py 79.25% <100.00%> (+2.40%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ianthomas23
Copy link
Member Author

After discussion, we've decided to allow multiple *_n reductions only if they all have the same n value. This allows us to keep the new coordinate label as n.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just need user docs, which can be a separate PR.

@ianthomas23 ianthomas23 merged commit 1806726 into holoviz:main Mar 6, 2023
@ianthomas23 ianthomas23 deleted the first_last_max_min_n_reductions branch March 6, 2023 09:45
@ianthomas23 ianthomas23 added this to the v0.14.5 milestone Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants