-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized datashader aggregation of NdOverlays #1430
Conversation
Excellent, thanks! I'm too tired to try to parse the code, but is it using the ability of datashader to compute multiple aggregations in a single pass? |
No, even so it's still faster, which is perhaps a bit surprising. I'll do some more profiling tomorrow. |
I was wrong it's slightly slower, but including the concatenation step it still wins out massively both on performance and memory load. |
Means it's perhaps still worth optimizing |
|
Here are some benchmarks, the data here are 12 curves of increasing length where 1 minute is equivalent to 60000*60 samples. The four conditions are comparing line aggregation of multiple curves either by summing the aggregates (the new approach) or by aggregating over concatenated curves separated by NaNs. You can see that the new approach is generally slightly slower than aggregating over already concatenated lines, but it scales much better when using dask. |
744548d
to
e898c0c
Compare
@philippjfr Thanks for fixing the warning! Is it now ready to merge or is there something else you wish to do first? |
Yes, this is ready to merge now. Further optimizations can come in later PRs. |
Great! Merging. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This PR provides major optimizations when using the datashader operations to aggregate multiple objects in an NdOverlay using the
count
,sum
, andmean
operations. Each Element is aggregated separately and the individual aggregates are summed. A small complication is thatNaNs
have to be replaced by zeros and masked at the end.mean
is supported by dividingsum
andcount
aggregates. This avoids the large memory and performance overhead of concatenating multiple dataframes together. I'm still working on adding an optimization forcount_cat
but it should also be fairly straightforward.