Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blob feature detectors (blob_dog, blob_log, blob_doh) #413

Merged
merged 15 commits into from
Nov 18, 2022

Conversation

monzelr
Copy link
Contributor

@monzelr monzelr commented Sep 26, 2022

Hi again,
due to renaming my branch to add-blob-detection, the old PR #411 had been closed.

This PR solves issue #398.

Basically, I followed the suggestions by @grlee77 and here is the difference to PR #411:

  • replaced the draw.disk and draw.ellipsoid GPU functions by the CPU functions (skimage.draw.disk / ellipsoid) in the unit tests, the matrices moved to GPU via cupy.asarray
  • removed the submodule cucim.skimage.draw becaused it is not needed anymore
  • deleted two unit tests because _blob_overlap is implemented in c++/cuda and not in python anymore

Again thanks to @annendominik who helped with the implementation.

And also thank you for your help @grlee77 - it is the first time I do a PR in github. ;-)

@monzelr monzelr requested a review from a team as a code owner September 26, 2022 08:20
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

@jakirkham
Copy link
Member

ok to test

@jakirkham jakirkham added feature request New feature or request non-breaking Introduces a non-breaking change labels Sep 26, 2022
@jjacobelli
Copy link
Contributor

ok to test

@gigony gigony added this to the v22.12.00 milestone Oct 4, 2022
@vbrow29
Copy link

vbrow29 commented Oct 5, 2022

Just leaving a comment to say I have been monitoring this PR (interested in using GPU-accelerated blob detection algorithms from skimage). I have taken an attempt at building cucim using the implementation in this PR, and have found it to provide the speedup I am looking for, at least for blob_log in 3D images. However, I thought it may be useful to note that while the spots' XYZ coordinates for my images are consistent for this implementation and the base skimage method, the sigma values disagree. I am not sure if this is expected due to the difference in implementation or if the functions (esp. prune_blobs) still need some tweaking. Looking forward to seeing this merged with cucim!!

@monzelr
Copy link
Contributor Author

monzelr commented Oct 5, 2022

Just leaving a comment to say I have been monitoring this PR (interested in using GPU-accelerated blob detection algorithms from skimage). I have taken an attempt at building cucim using the implementation in this PR, and have found it to provide the speedup I am looking for, at least for blob_log in 3D images. However, I thought it may be useful to note that while the spots' XYZ coordinates for my images are consistent for this implementation and the base skimage method, the sigma values disagree. I am not sure if this is expected due to the difference in implementation or if the functions (esp. prune_blobs) still need some tweaking. Looking forward to seeing this merged with cucim!!

@vbrow29
Thank you for this hint - I could not appropriate test the sigma values in 3D because I do not have any 3D data (normally, this is covered in the unit tests which I can't run because I am on windows). The sigma values should match between both implementation. I guess I have an error in the Cuda code which I have to debug.

Is it possible for you to provide any 3D image data so I can have a look at those sigma values?

@vbrow29
Copy link

vbrow29 commented Oct 5, 2022

@monzelr sure thing. I am not sure what the best method is for sending the files to you, if you reach out to me at [email protected] I can transfer the files there. Or if preferred, I can create a repo / google drive folder containing the 3D test image I was using and the blob_log parameters. Let me know which is best!

@monzelr
Copy link
Contributor Author

monzelr commented Oct 5, 2022

@vbrow29 : just sent a request via mail

@monzelr
Copy link
Contributor Author

monzelr commented Oct 6, 2022

@vbrow29 :
I just looked at your provided 3D data and the differences in the sigma values.

The problem is a kind of error propagation, here a summary:

  • cucim.skimage.util.img_as_float returns a np.float32 while skimage.uti.img_as_float returns np.float64 - this is maybe a bug, but it is not the reason for the difference in sigma values (I will do a bug report to clarify this)
  • cupyx.scipy.ndimage.gaussian_laplace behaves different to scipy.ndimage.gaussian_laplace. This is maybe due to the different implementation and the different architecture between CPU / GPU - I did not further investigate this. During debugging, I see different values in the image_cube variable in cucim.skimage.feature.blob.blob_log which results from the gaussian_laplace function
  • cucim.skimage.feature.peak.peak_local_max also behaves also a bit different than skimage.feature.peak.peak_local_max. I guess this is like the gaussian_laplace function a problem with the different architecture and floating precision or maybe also because of slightly different implementation

Overall, I can see that the function blob_log finds the blobs with different sigma values. The blob detection algorithm chooses always the blob with the highest sigma values. The cucim.skimage.feature.blob.blob_log implementation tends to find larger blobs then the skimage implementation.

I know this is a kind of frustrated answer because the cucim-algorithm is not exactly like the skimage-algorithm, even we use equal-named functions...

What you can do to get the right sigma values is to look for blobs with the cucim implementation and then iterate over the blob regions to determine the skimage sigma values.


Also note: the sigma values are generated like this:

sigma_list = cupy.linspace(min_sigma, max_sigma, num_sigma)

So basically we have discrete sigma values and every blob detection algorithm returns sigma values from the sigma_list.

@grlee77 grlee77 requested a review from a team as a code owner November 16, 2022 22:29
@grlee77 grlee77 changed the base branch from branch-22.10 to branch-22.12 November 16, 2022 22:30
Copy link
Contributor

@grlee77 grlee77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @monzelr, this is looking good now and all tests are passing. We will get this in for the upcoming 2022.12.00 release.

I pushed some minor changes, mainly consisting of:

1.) pep8 / style fixes
2.) declare an INT_T type in the kernels and use int instead of long long when possible
3.) memoize the RawKernel generating functions
4.) removed unused kdtree-related code

@monzelr
Copy link
Contributor Author

monzelr commented Nov 17, 2022

Thank you @grlee77!

better to use math operations on tuples directly
either numpy or cupy have substantial overhead for working with such small arrays
improves memory efficiency of blob-detection methods
@grlee77
Copy link
Contributor

grlee77 commented Nov 17, 2022

Made a few performance-related changes to:

  • avoid various intermediate arrays to reduce memory overhead
  • create sigma_list using basic tuple operations to avoid the overhead calling numpy or cupy functions on tiny arrays of length image.ndim

Initial benchmarking looks great. The results below can be generated running the script in the benchmarks/skimage folder. e.g. to benchmark blob_dog for shape (3840, 2160) and float32 dtype, with repeated runs over a 5 second interval:

python cucim_feature_bench.py -f blob_dog -i 3840,2160 -d float32 -t 5

2D results

Seeing 100x, 132x, and 41x speedups for blob_dog, blob_log and blob_doh, respectively on a 4k float32 image.

shape                                                 (3840, 2160)
function_name                                             blob_dog
dtype                                                      float32
ndim                                                           2.0
GPU accel                                               100.031775
CPU: host (mean)                                          6.822972
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.068201
GPU: host (std)                                            0.00036
GPU: device (mean)                                        0.068208
GPU: device (std)                                          0.00036
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                                 (3840, 2160)
function_name                                             blob_log
dtype                                                      float32
ndim                                                           2.0
GPU accel                                               132.400334
CPU: host (mean)                                         35.929339
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.271333
GPU: host (std)                                           0.017035
GPU: device (mean)                                        0.271369
GPU: device (std)                                         0.017038
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                                 (3840, 2160)
function_name                                             blob_doh
dtype                                                      float32
ndim                                                           2.0
GPU accel                                                41.072756
CPU: host (mean)                                           5.61557
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.136709
GPU: host (std)                                           0.001834
GPU: device (mean)                                        0.136722
GPU: device (std)                                         0.001838
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

3D results

Seeing >130x speedups for blob_dog and blob_log on a (192, 192, 192) float32 volume

shape                                              (192, 192, 192)
function_name                                             blob_dog
dtype                                                      float32
ndim                                                           3.0
GPU accel                                               133.993833
CPU: host (mean)                                         15.755103
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.117573
GPU: host (std)                                           0.001517
GPU: device (mean)                                        0.117581
GPU: device (std)                                         0.001517
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                              (192, 192, 192)
function_name                                             blob_log
dtype                                                      float32
ndim                                                           3.0
GPU accel                                               136.752506
CPU: host (mean)                                         59.750352
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.436915
GPU: host (std)                                            0.01022
GPU: device (mean)                                        0.436923
GPU: device (std)                                          0.01022
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

@grlee77
Copy link
Contributor

grlee77 commented Nov 17, 2022

rerun tests

@grlee77
Copy link
Contributor

grlee77 commented Nov 17, 2022

CI failures are unrelated (apparently related to a new version of CMake released today)

@grlee77 grlee77 changed the title Adding scikit-image' Blob detection (second try) Add blob feature detectors (blob_dog, blob_log, blob_doh) Nov 18, 2022
@grlee77
Copy link
Contributor

grlee77 commented Nov 18, 2022

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Introduces a non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants