Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds resample_when; the limit before automatically applying datashade/rasterize/downsample #1103

Merged
merged 17 commits into from
Oct 12, 2023

Conversation

ahuang11
Copy link
Collaborator

Closes #1101

    The threshold before applying the operation (datashade / rasterize);
    if the number of points is below this value, the plot will
    not be rasterized or datashaded.

Needs more testing on various types of charts, but this is such a cool feature!!

Screen.Recording.2023-07-13.at.9.54.17.PM.mov

@ahuang11 ahuang11 changed the title Adds op_threshold; the limit before applying datashade/rasterize Adds op_threshold; the limit before automatically applying datashade/rasterize Jul 14, 2023
@hoxbro
Copy link
Member

hoxbro commented Jul 14, 2023

Could a case be made for having rasterize/datashade also accept integers as inputs, and if it was an integer it will use apply_when?

Also is your example running the current code? I can't see why op_threshold is enabled in it.

@ahuang11
Copy link
Collaborator Author

Oops the video I set a default op_threshold of 5000 but I decided not to modify the default.

Having rasterize and datashade accept integers could work. What are others preferences?

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Jul 14, 2023

I personally prefer it separate as it's more explicit.

For example, this reads better:

df.hvplot.points('0', '1', rasterize=True, op_threshold=1000)

Than this in my opinion:

df.hvplot.points('0', '1', rasterize=1000)

Implementation wise, it'd be slightly cleaner too I think:

if op_threshold is not None:
   apply_when(...)

vs

if isinstance(rasterize, int) and rasterize > 0 # and also handle datashade:
   apply_when(...)

@hoxbro
Copy link
Member

hoxbro commented Jul 14, 2023

Then I would suggest going in the other direction and having two variables, rasterize_threshold and datashade_threshold, to be more explicit. Mainly because op does not mean anything to an average user.

An alternative could be another prefix.

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Jul 14, 2023

I do not like having both keywords like that; one keyword internally does the same thing to both operations so it's a good balance I think; if not it's another if else:

if rasterize_threshold is not None:
    threshold = rasterize_threshold
elif datashade_threshold is not None:
    threshold = datashade_threshold

or

threshold = rasterize_threshold or datashade_threshold

Perhaps operation_threshold or rasterization_threshold or aggregation_threshold then? It'll be listed under https://hvplot.holoviz.org/user_guide/Customization.html#datashading-options so it should be pretty clear it's related to datashade options.

I like aggregation_threshold, or if a really long name, aggregation_toggle_threshold

@ahuang11 ahuang11 marked this pull request as ready for review July 17, 2023 12:55
@ahuang11 ahuang11 changed the title Adds op_threshold; the limit before automatically applying datashade/rasterize Adds aggregation_threshold; the limit before automatically applying datashade/rasterize Jul 17, 2023
@maximlt
Copy link
Member

maximlt commented Jul 18, 2023

Generally when it's about API design I ask @jbednar, @philippjfr or @jlstevens what they think about the suggestions. They probably already thought about the suggested API a number of years ago, I also like the idea that it can help ensuring consistency with the existing APIs (name, style, etc.) across HoloViz as they've designed most of that.

@jbednar
Copy link
Member

jbednar commented Jul 18, 2023

I like aggregation_threshold=1e8, or rasterize_threshold=1e8, or aggregate_over=1e8 or rasterize_if_over=1e8. I agree it should be one argument to cover all aggregation operations. I think there should also be a way to configure this as a default at the holoviews level, BTW, not just hvplot.

@ahuang11
Copy link
Collaborator Author

Having a default will cause it to always return an overlay of the original chart element + Image (rasterized) so I'm hesitant on that.

@maximlt maximlt added this to the 0.9.0 milestone Sep 11, 2023
@maximlt maximlt self-requested a review October 2, 2023 08:32
Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahuang11 do you think aggregator_threshold could also be used to apply the downsample operation (lttb by default now but will certainly be extended to nth for some tabular plot types and regrid for gridded types)? If so, should it be renamed to reflect that it just doesn't apply to aggregator operations like rasterize and datashade (assuming for instance that lttb isn't an aggregating operation, which I think is correct to say?).

We're seriously lacking a user guide that demonstrates how to plot large data and where we could demonstrate that feature and how other operations work, not your fault though!

hvplot/converter.py Outdated Show resolved Hide resolved
hvplot/converter.py Outdated Show resolved Hide resolved
hvplot/converter.py Outdated Show resolved Hide resolved
hvplot/converter.py Outdated Show resolved Hide resolved
@ahuang11 ahuang11 requested a review from maximlt October 6, 2023 17:19
@maximlt maximlt changed the title Adds aggregation_threshold; the limit before automatically applying datashade/rasterize Adds resample_when; the limit before automatically applying datashade/rasterize Oct 8, 2023
@maximlt
Copy link
Member

maximlt commented Oct 8, 2023

After a discussion with Andrew we're suggesting resample_when instead of aggregator_threshold, resample is feels more understandable and general and when is more inline with apply_when. I also liked resample_above but thought that maybe it's too specific and might limit the type of value it could accept in the future (e.g. some predicate function), when is more future-proof. Interested in feedback from others on this name, naming is hard!

@ahuang11 the tests are now failing, were they passing before? I would also suggest making resample_when apply to the downsample op and not just rasterize/datashade.

@ahuang11
Copy link
Collaborator Author

Added support for downsample

@maximlt
Copy link
Member

maximlt commented Oct 12, 2023

Added support for downsample

Thanks a lot for that! I did a little bit of refactoring, in particular I found out that calling df.hvplot.points(resample_when=100, rasterize=True) would lead to errors as the generated plot element has no vdims but two kdims x and y, so plot.vdims[0] raised an error. So I decided to leverage the fact that elements implement __len__ and len(plot) to obtain the size. I've also reworked the docstring as I just couldn't wrap my head around it.

I'm going to merge now, don't hesitate opening new PRs to refine the functionality if you thing it needs it.

@maximlt maximlt changed the title Adds resample_when; the limit before automatically applying datashade/rasterize Adds resample_when; the limit before automatically applying datashade/rasterize/downsample Oct 12, 2023
@maximlt maximlt merged commit 2a538b3 into main Oct 12, 2023
7 checks passed
@maximlt maximlt deleted the add_op_threshold branch October 12, 2023 07:38
@ahuang11 ahuang11 mentioned this pull request Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add threshold kwarg; show individual points when zoomed; else datashade
5 participants