Add blob feature detectors (blob_dog, blob_log, blob_doh) #413

monzelr · 2022-09-26T08:20:02Z

Hi again,
due to renaming my branch to add-blob-detection, the old PR #411 had been closed.

This PR solves issue #398.

Basically, I followed the suggestions by @grlee77 and here is the difference to PR #411:

replaced the draw.disk and draw.ellipsoid GPU functions by the CPU functions (skimage.draw.disk / ellipsoid) in the unit tests, the matrices moved to GPU via cupy.asarray
removed the submodule cucim.skimage.draw becaused it is not needed anymore
deleted two unit tests because _blob_overlap is implemented in c++/cuda and not in python anymore

Again thanks to @annendominik who helped with the implementation.

And also thank you for your help @grlee77 - it is the first time I do a PR in github. ;-)

…kimage.feature.draw.disk / ellipsoid functins and moved those matrices to gpu with cp.asarray

…s now implemented as cuda / c++ file

…partial implemented

GPUtester · 2022-09-26T08:20:04Z

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

jakirkham · 2022-09-26T08:29:01Z

ok to test

jjacobelli · 2022-09-26T09:00:06Z

ok to test

vbrow29 · 2022-10-05T05:00:15Z

Just leaving a comment to say I have been monitoring this PR (interested in using GPU-accelerated blob detection algorithms from skimage). I have taken an attempt at building cucim using the implementation in this PR, and have found it to provide the speedup I am looking for, at least for blob_log in 3D images. However, I thought it may be useful to note that while the spots' XYZ coordinates for my images are consistent for this implementation and the base skimage method, the sigma values disagree. I am not sure if this is expected due to the difference in implementation or if the functions (esp. prune_blobs) still need some tweaking. Looking forward to seeing this merged with cucim!!

monzelr · 2022-10-05T08:21:12Z

Just leaving a comment to say I have been monitoring this PR (interested in using GPU-accelerated blob detection algorithms from skimage). I have taken an attempt at building cucim using the implementation in this PR, and have found it to provide the speedup I am looking for, at least for blob_log in 3D images. However, I thought it may be useful to note that while the spots' XYZ coordinates for my images are consistent for this implementation and the base skimage method, the sigma values disagree. I am not sure if this is expected due to the difference in implementation or if the functions (esp. prune_blobs) still need some tweaking. Looking forward to seeing this merged with cucim!!

@vbrow29
Thank you for this hint - I could not appropriate test the sigma values in 3D because I do not have any 3D data (normally, this is covered in the unit tests which I can't run because I am on windows). The sigma values should match between both implementation. I guess I have an error in the Cuda code which I have to debug.

Is it possible for you to provide any 3D image data so I can have a look at those sigma values?

vbrow29 · 2022-10-05T19:39:18Z

@monzelr sure thing. I am not sure what the best method is for sending the files to you, if you reach out to me at [email protected] I can transfer the files there. Or if preferred, I can create a repo / google drive folder containing the 3D test image I was using and the blob_log parameters. Let me know which is best!

monzelr · 2022-10-05T20:08:30Z

@vbrow29 : just sent a request via mail

monzelr · 2022-10-06T10:24:48Z

@vbrow29 :
I just looked at your provided 3D data and the differences in the sigma values.

The problem is a kind of error propagation, here a summary:

cucim.skimage.util.img_as_float returns a np.float32 while skimage.uti.img_as_float returns np.float64 - this is maybe a bug, but it is not the reason for the difference in sigma values (I will do a bug report to clarify this)
cupyx.scipy.ndimage.gaussian_laplace behaves different to scipy.ndimage.gaussian_laplace. This is maybe due to the different implementation and the different architecture between CPU / GPU - I did not further investigate this. During debugging, I see different values in the image_cube variable in cucim.skimage.feature.blob.blob_log which results from the gaussian_laplace function
cucim.skimage.feature.peak.peak_local_max also behaves also a bit different than skimage.feature.peak.peak_local_max. I guess this is like the gaussian_laplace function a problem with the different architecture and floating precision or maybe also because of slightly different implementation

Overall, I can see that the function blob_log finds the blobs with different sigma values. The blob detection algorithm chooses always the blob with the highest sigma values. The cucim.skimage.feature.blob.blob_log implementation tends to find larger blobs then the skimage implementation.

I know this is a kind of frustrated answer because the cucim-algorithm is not exactly like the skimage-algorithm, even we use equal-named functions...

What you can do to get the right sigma values is to look for blobs with the cucim implementation and then iterate over the blob regions to determine the skimage sigma values.

Also note: the sigma values are generated like this:

sigma_list = cupy.linspace(min_sigma, max_sigma, num_sigma)

So basically we have discrete sigma values and every blob detection algorithm returns sigma values from the sigma_list.

…tection

replace 'long long' with INT_T, using 'int' instead when possible

grlee77

Thanks again @monzelr, this is looking good now and all tests are passing. We will get this in for the upcoming 2022.12.00 release.

I pushed some minor changes, mainly consisting of:

1.) pep8 / style fixes
2.) declare an INT_T type in the kernels and use int instead of long long when possible
3.) memoize the RawKernel generating functions
4.) removed unused kdtree-related code

monzelr · 2022-11-17T09:15:56Z

Thank you @grlee77!

better to use math operations on tuples directly either numpy or cupy have substantial overhead for working with such small arrays

improves memory efficiency of blob-detection methods

grlee77 · 2022-11-17T15:29:47Z

Made a few performance-related changes to:

avoid various intermediate arrays to reduce memory overhead
create sigma_list using basic tuple operations to avoid the overhead calling numpy or cupy functions on tiny arrays of length image.ndim

Initial benchmarking looks great. The results below can be generated running the script in the benchmarks/skimage folder. e.g. to benchmark blob_dog for shape (3840, 2160) and float32 dtype, with repeated runs over a 5 second interval:

python cucim_feature_bench.py -f blob_dog -i 3840,2160 -d float32 -t 5

2D results

Seeing 100x, 132x, and 41x speedups for blob_dog, blob_log and blob_doh, respectively on a 4k float32 image.

shape                                                 (3840, 2160)
function_name                                             blob_dog
dtype                                                      float32
ndim                                                           2.0
GPU accel                                               100.031775
CPU: host (mean)                                          6.822972
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.068201
GPU: host (std)                                            0.00036
GPU: device (mean)                                        0.068208
GPU: device (std)                                          0.00036
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                                 (3840, 2160)
function_name                                             blob_log
dtype                                                      float32
ndim                                                           2.0
GPU accel                                               132.400334
CPU: host (mean)                                         35.929339
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.271333
GPU: host (std)                                           0.017035
GPU: device (mean)                                        0.271369
GPU: device (std)                                         0.017038
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                                 (3840, 2160)
function_name                                             blob_doh
dtype                                                      float32
ndim                                                           2.0
GPU accel                                                41.072756
CPU: host (mean)                                           5.61557
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.136709
GPU: host (std)                                           0.001834
GPU: device (mean)                                        0.136722
GPU: device (std)                                         0.001838
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

3D results

Seeing >130x speedups for blob_dog and blob_log on a (192, 192, 192) float32 volume

shape                                              (192, 192, 192)
function_name                                             blob_dog
dtype                                                      float32
ndim                                                           3.0
GPU accel                                               133.993833
CPU: host (mean)                                         15.755103
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.117573
GPU: host (std)                                           0.001517
GPU: device (mean)                                        0.117581
GPU: device (std)                                         0.001517
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                              (192, 192, 192)
function_name                                             blob_log
dtype                                                      float32
ndim                                                           3.0
GPU accel                                               136.752506
CPU: host (mean)                                         59.750352
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.436915
GPU: host (std)                                            0.01022
GPU: device (mean)                                        0.436923
GPU: device (std)                                          0.01022
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

grlee77 · 2022-11-17T18:41:25Z

rerun tests

grlee77 · 2022-11-17T18:53:31Z

CI failures are unrelated (apparently related to a new version of CMake released today)

…tection

grlee77 · 2022-11-18T13:53:39Z

@gpucibot merge

monzelr added 5 commits September 24, 2022 17:22

initial commit of blob detection algorithm

fcc9580

added blob functions in init file

0137d36

replaced cucim.skimage.feature.draw.disk / ellipsoid functions with s…

75078c6

…kimage.feature.draw.disk / ellipsoid functins and moved those matrices to gpu with cp.asarray

removed unit tests for function _blob_overlap because this function i…

8ecc99b

…s now implemented as cuda / c++ file

deleted submodule draw because it is not needed anymore and was only …

649b404

…partial implemented

monzelr requested a review from a team as a code owner September 26, 2022 08:20

jakirkham added feature request New feature or request non-breaking Introduces a non-breaking change labels Sep 26, 2022

gigony added this to the v22.12.00 milestone Oct 4, 2022

monzelr mentioned this pull request Oct 6, 2022

[BUG] cucim.skimage.util.img_as_float returns np.float32 while skimage.util.img_as_float returns np.float64 #417

Closed

grlee77 added 2 commits November 16, 2022 16:51

Merge remote-tracking branch 'upstream/branch-22.12' into add-blob-de…

393d15a

…tection

fix conflict and flake8/style issues

6e1cc33

grlee77 requested a review from a team as a code owner November 16, 2022 22:29

grlee77 changed the base branch from branch-22.10 to branch-22.12 November 16, 2022 22:30

grlee77 added 3 commits November 16, 2022 17:36

additional style fixes

d5ecf6a

memoize raw module

03c8a14

replace 'long long' with INT_T, using 'int' instead when possible

remove _prune_blobs_kdtree for now since it is currently unused

48399ea

grlee77 approved these changes Nov 17, 2022

View reviewed changes

grlee77 added 4 commits November 17, 2022 09:13

reduce overhead in sigma_list calculations

d537812

better to use math operations on tuples directly either numpy or cupy have substantial overhead for working with such small arrays

preallocate image cube and reduce number of intermediate arrays

6f77470

improves memory efficiency of blob-detection methods

fix docstring examples

bd77b21

add blob detection functions to the benchmarking scripts

cf77499

Merge remote-tracking branch 'upstream/branch-22.12' into add-blob-de…

ad30f7c

…tection

grlee77 changed the title ~~Adding scikit-image' Blob detection (second try)~~ Add blob feature detectors (blob_dog, blob_log, blob_doh) Nov 18, 2022

rapids-bot bot merged commit 8b92401 into rapidsai:branch-22.12 Nov 18, 2022

This was referenced Nov 30, 2022

[FEA] Implementing Scikit-Image's Blob Detection #398

Closed

Reduce memory consumption in blob_* functions with preallocation scikit-image/scikit-image#6597

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blob feature detectors (blob_dog, blob_log, blob_doh) #413

Add blob feature detectors (blob_dog, blob_log, blob_doh) #413

monzelr commented Sep 26, 2022 •

edited

Loading

GPUtester commented Sep 26, 2022

jakirkham commented Sep 26, 2022

jjacobelli commented Sep 26, 2022

vbrow29 commented Oct 5, 2022

monzelr commented Oct 5, 2022

vbrow29 commented Oct 5, 2022

monzelr commented Oct 5, 2022

monzelr commented Oct 6, 2022

grlee77 left a comment

monzelr commented Nov 17, 2022

grlee77 commented Nov 17, 2022

grlee77 commented Nov 17, 2022

grlee77 commented Nov 17, 2022

grlee77 commented Nov 18, 2022

Add blob feature detectors (blob_dog, blob_log, blob_doh) #413

Add blob feature detectors (blob_dog, blob_log, blob_doh) #413

Conversation

monzelr commented Sep 26, 2022 • edited Loading

GPUtester commented Sep 26, 2022

jakirkham commented Sep 26, 2022

jjacobelli commented Sep 26, 2022

vbrow29 commented Oct 5, 2022

monzelr commented Oct 5, 2022

vbrow29 commented Oct 5, 2022

monzelr commented Oct 5, 2022

monzelr commented Oct 6, 2022

grlee77 left a comment

Choose a reason for hiding this comment

monzelr commented Nov 17, 2022

grlee77 commented Nov 17, 2022

2D results

3D results

grlee77 commented Nov 17, 2022

grlee77 commented Nov 17, 2022

grlee77 commented Nov 18, 2022

monzelr commented Sep 26, 2022 •

edited

Loading