Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lasso / poly_select selections on datashaded charts in holoviews seems to trigger two similar operations in both spatialpandas and shapely #5466

Closed
GeoVizNow opened this issue Oct 2, 2022 · 2 comments · Fixed by #5468
Assignees
Labels
type: bug Something isn't correct or isn't working
Milestone

Comments

@GeoVizNow
Copy link
Contributor

GeoVizNow commented Oct 2, 2022

Hi,
first of all: Thanks for the great software that continuing to being developed under the holoviz umbrella!

I really value the datashade and rasterize operations blended into holoviews. Now I wanted to notify on one concern I have had lately. I have found it hard/impossible to get a performant datashade lasso selection on relatively large datashaded (using holoviews.operation.datashader) datasets (millions of rows) after holoviews version 1.14.5. By this reason I have needed to stick to 1.14.5, but would really like to update to take advantage of all the great new features. I have been digging a bit in the code (though I struggle to have well behaving editable installs) and have now stumbled accross one thing that I wanted to flag.
I am happy to try making a pull request if you agree on my observations.

In holoviews.element.selection.py (after version 1.14.5) there is a code section that seems to affect performance if you happen to have both spatialpandas and shapely installed in same environment as holoviews when running versions after 1.14.5.

Indeed it seems the issue might have been introduced in commit '668008dd' where there were some changes in the 'spatial_select_columnar' function. It seems the fact that the return is lifted outside the try except clauses makes the code do the same thing with both spatialpandas and shapely if both are installed. I appreciate the much faster performance of spatialpandas, but that seems to be accidentally hidden since shapely seems to do the same operation afterwards (again if both spatialpandas and shapely are installed).

ALL software version info

library version
holoviews 1.15.0
pandas 1.4.3
pyarrow 8.0.0
spatialpandas 0.4.4
shapely 1.8.4

Description of expected behavior and the observed behavior

Expected behaviour would be that spatialpandas did the task if it is installed and then that shapely is not triggered.

Example where the issue can be experienced

import pandas as pd
import holoviews as hv
from holoviews.operation.datashader import datashade
from holoviews.selection import link_selections
from holoviews import opts
hv.extension('bokeh')

df = pd.read_parquet(
    'https://s3.amazonaws.com/datashader-data/nyc_taxi_wide.parq',
    columns = ['dropoff_x', 'dropoff_y']
)

ls = link_selections.instance()

plot = datashade(hv.Scatter(df, 'dropoff_x', 'dropoff_y'),
                             cmap='viridis').opts(frame_height=400,
                             data_aspect=1, bgcolor='darkslategray')

ls(plot).opts(opts.RGB(tools=['poly_select']))

# Then do either lasso selection or poly_select selection and experience a long waiting time if a large number of points are selected. 

line 106-122 in function 'spatial_select_columnar':

    try:
        from spatialpandas.geometry import Polygon, PointArray
        points = PointArray((masked_xvals.astype('float'), masked_yvals.astype('float')))
        poly = Polygon([np.concatenate([geometry, geometry[:1]]).flatten()])
        geom_mask = points.intersects(poly)
    except Exception:
        pass
    try:
        from shapely.geometry import Point, Polygon
        points = (Point(x, y) for x, y in zip(masked_xvals, masked_yvals))
        poly = Polygon(geometry)
        geom_mask = np.array([poly.contains(p) for p in points])
    except ImportError:
        raise ImportError("Lasso selection on tabular data requires "
                          "either spatialpandas or shapely to be available.")
    mask[np.where(mask)[0]] = geom_mask
    return mask

I think this might be fixed by either making and returning the 'mask' in both try clauses (similar to how it was before commit '668008dd'. Looking forward to adjusting this if you agree in the observation.

@jbednar jbednar added this to the 1.15.1 milestone Oct 4, 2022
@maximlt maximlt added the type: bug Something isn't correct or isn't working label Oct 4, 2022
@hoxbro
Copy link
Member

hoxbro commented Oct 4, 2022

Thank you for thoroughly investigating the problem and finding the solution.

We would have loved for you to contribute, but the next release of Holoviews is expected to be released today, and we wanted to have this fix in it.

@GeoVizNow
Copy link
Contributor Author

GeoVizNow commented Oct 4, 2022

Thanks a lot for your swift fix of the issue! Really good and timely if you are able to include this for the upcoming release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't correct or isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants