You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
first of all: Thanks for the great software that continuing to being developed under the holoviz umbrella!
I really value the datashade and rasterize operations blended into holoviews. Now I wanted to notify on one concern I have had lately. I have found it hard/impossible to get a performant datashade lasso selection on relatively large datashaded (using holoviews.operation.datashader) datasets (millions of rows) after holoviews version 1.14.5. By this reason I have needed to stick to 1.14.5, but would really like to update to take advantage of all the great new features. I have been digging a bit in the code (though I struggle to have well behaving editable installs) and have now stumbled accross one thing that I wanted to flag.
I am happy to try making a pull request if you agree on my observations.
In holoviews.element.selection.py (after version 1.14.5) there is a code section that seems to affect performance if you happen to have both spatialpandas and shapely installed in same environment as holoviews when running versions after 1.14.5.
Indeed it seems the issue might have been introduced in commit '668008dd' where there were some changes in the 'spatial_select_columnar' function. It seems the fact that the return is lifted outside the try except clauses makes the code do the same thing with both spatialpandas and shapely if both are installed. I appreciate the much faster performance of spatialpandas, but that seems to be accidentally hidden since shapely seems to do the same operation afterwards (again if both spatialpandas and shapely are installed).
ALL software version info
library
version
holoviews
1.15.0
pandas
1.4.3
pyarrow
8.0.0
spatialpandas
0.4.4
shapely
1.8.4
Description of expected behavior and the observed behavior
Expected behaviour would be that spatialpandas did the task if it is installed and then that shapely is not triggered.
Example where the issue can be experienced
import pandas as pd
import holoviews as hv
from holoviews.operation.datashader import datashade
from holoviews.selection import link_selections
from holoviews import opts
hv.extension('bokeh')
df = pd.read_parquet(
'https://s3.amazonaws.com/datashader-data/nyc_taxi_wide.parq',
columns = ['dropoff_x', 'dropoff_y']
)
ls = link_selections.instance()
plot = datashade(hv.Scatter(df, 'dropoff_x', 'dropoff_y'),
cmap='viridis').opts(frame_height=400,
data_aspect=1, bgcolor='darkslategray')
ls(plot).opts(opts.RGB(tools=['poly_select']))
# Then do either lasso selection or poly_select selection and experience a long waiting time if a large number of points are selected.
line 106-122 in function 'spatial_select_columnar':
try:
from spatialpandas.geometry import Polygon, PointArray
points = PointArray((masked_xvals.astype('float'), masked_yvals.astype('float')))
poly = Polygon([np.concatenate([geometry, geometry[:1]]).flatten()])
geom_mask = points.intersects(poly)
except Exception:
pass
try:
from shapely.geometry import Point, Polygon
points = (Point(x, y) for x, y in zip(masked_xvals, masked_yvals))
poly = Polygon(geometry)
geom_mask = np.array([poly.contains(p) for p in points])
except ImportError:
raise ImportError("Lasso selection on tabular data requires "
"either spatialpandas or shapely to be available.")
mask[np.where(mask)[0]] = geom_mask
return mask
I think this might be fixed by either making and returning the 'mask' in both try clauses (similar to how it was before commit '668008dd'. Looking forward to adjusting this if you agree in the observation.
The text was updated successfully, but these errors were encountered:
Hi,
first of all: Thanks for the great software that continuing to being developed under the holoviz umbrella!
I really value the datashade and rasterize operations blended into holoviews. Now I wanted to notify on one concern I have had lately. I have found it hard/impossible to get a performant datashade lasso selection on relatively large datashaded (using holoviews.operation.datashader) datasets (millions of rows) after holoviews version 1.14.5. By this reason I have needed to stick to 1.14.5, but would really like to update to take advantage of all the great new features. I have been digging a bit in the code (though I struggle to have well behaving editable installs) and have now stumbled accross one thing that I wanted to flag.
I am happy to try making a pull request if you agree on my observations.
In holoviews.element.selection.py (after version 1.14.5) there is a code section that seems to affect performance if you happen to have both spatialpandas and shapely installed in same environment as holoviews when running versions after 1.14.5.
Indeed it seems the issue might have been introduced in commit '668008dd' where there were some changes in the 'spatial_select_columnar' function. It seems the fact that the return is lifted outside the try except clauses makes the code do the same thing with both spatialpandas and shapely if both are installed. I appreciate the much faster performance of spatialpandas, but that seems to be accidentally hidden since shapely seems to do the same operation afterwards (again if both spatialpandas and shapely are installed).
ALL software version info
Description of expected behavior and the observed behavior
Expected behaviour would be that spatialpandas did the task if it is installed and then that shapely is not triggered.
Example where the issue can be experienced
line 106-122 in function 'spatial_select_columnar':
I think this might be fixed by either making and returning the 'mask' in both try clauses (similar to how it was before commit '668008dd'. Looking forward to adjusting this if you agree in the observation.
The text was updated successfully, but these errors were encountered: