-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests(pruning): Tests TopoStatsPrune and convPrune #848
tests(pruning): Tests TopoStatsPrune and convPrune #848
Conversation
+ Adds a smaller (30x30) test skeleton for testing pruning, this makes it easier to work with, however, this does raise a question as the resulting skeleton, generated by `skimage.morphology.skeletonize()` does have a T-shaped junction which it has been suggested never arise (more on this below). + Tests for `TopoStatsPrune` and `convPrune` attempt to demonstrate the effect of varying parameters has on pruning to which end the following parameters are varied... + `max_length` + `height_threshold` + `method_values` + `method_outliers` There are some issues identified so far... `convPrune` As noted in the `reason` for the first paramterised test failing which arises when `max_height = None` (since `int` can not be compared to `None`) I think there is a problem with the default height being used here. Line 454 tries to set the default max_branch_length to be 15% of the total number of elements in the skeleton array. `self.skeleton.size` and counts the number of elements in an array (and because we are working with 2D images this will always be the product of the number of rows and columns). But the length of this will _always_ be `1` since the total number of elements in the skeleton will always be an integer. Thus if `max_length == -1` then the `max_branch_length` will always be 0! That said in this dummy example we don't even get that, instead we get an error raised ``` max_length = -1 simple_array = np.asarray([[0, 0], [0, 0], [0, 0]]) total_points = simple_array.size max_branch_length = max_length if max_length != -1 else int(len(total_points) * 0.15)\ TypeError: object of type 'int' has no len() ``` Its also unclear why the `convPrune._prune_by_length()` mtehod takes the argument `max_length` when its an attribute of the class which could be used instead. This doesn't happen in the `topostatsPrune` class because the length of the molecule is based on the co-ordinates of the skeleton rather than the size of the array. Excessive Complexity/Code duplication The splitting of functionality here into multiple classes means there is some code duplication. Both `topostatsPrune` and `convPrune` have methods to `_prune_by_length()` conditionally based on the value of `self.max_length` not being `None` but they do so in different manners. Both methods then go on to called `heightPruning()` in the same manner. The two classes could be combined into one with an option of whether to use the original length based pruning or the convolutional approach and call the appropriate method (renaming the method from `convPrune` to `prune_length_by_conv()` and the one from `topostatsPrune` to `prune_length_by_neighbours()`. In turn this would mean that there is no need to have the `pruneSkeleton()` factory class (and having that as a class in itself seems like overkill when a series of functions would suffice and address the complaints from Pylint on `too-few-public-methods` too). Handling multiple grains The refactoring done previously in #600 removed the loops from every method/function in the tracing module so that it only handled a single grain. The looping over of multiple grains is handled by code in `process_scan.py`, this means the `prune_all_skeletons()` methods can be removed too, further simplifying the code base. T-shaped junctions Previous work in #835 originally tested whether `pruning.rm_nibs()` "Remove single pixel branches (nibs) not identified by nearest neighbour algorithsm as there may be >2 neighbours" and the current tests show that these are still left by the pruning methods being tested, even though the last step is to re-run the plain skeletonisation method from `skimage` (Zhang's). The following is a reproducible example that shows such a nib remains after pruning ```python import numpy as np import matplotlib.pyplot as plt from skimage import draw, filters from skimage.morphology import skeletonize def _generate_heights(skeleton: npt.NDArray, scale: float = 100, sigma: float = 5.0, cval: float = 20.0) -> npt.NDArray: """Generate heights from skeletons by scaling image and applying Gaussian blurring. Uses scikit-image 'skimage.filters.gaussian()' to generate heights from skeletons. Parameters ---------- skeleton : npt.NDArray Binary array of skeleton. scale : float Factor to scale heights by. Boolean arrays are 0/1 and so the factor will be the height of the skeleton ridge. sigma : float Standard deviation for Gaussian kernel passed to `skimage.filters.gaussian()'. cval : float Value to fill past edges of input, passed to `skimage.filters.gaussian()'. Returns ------- npt.NDArray Array with heights of image based on skeleton which will be the backbone and target. """ return filters.gaussian(skeleton * scale, sigma=sigma, cval=cval) def _generate_random_skeleton(**extra_kwargs): """Generate random skeletons and heights using skimage.draw's random_shapes().""" kwargs = { "image_shape": (128, 128), "max_shapes": 20, "channel_axis": None, "shape": None, "allow_overlap": True, } # kwargs.update heights = {"scale": 100, "sigma": 5.0, "cval": 20.0} kwargs = {**kwargs, **extra_kwargs} random_image, _ = draw.random_shapes(**kwargs) mask = random_image != 255 skeleton = skeletonize(mask) return {"original": mask, "img": _generate_heights(skeleton, **heights), "skeleton": skeleton} def pruned_plot(gen_shape: dict) -> None: """Plot the original skeleton, its derived height and the pruned skeleton.""" img_skeleton = gen_shape pruned = topostatsPrune( img_skeleton["img"], img_skeleton["skeleton"], max_length=-1, height_threshold=90, method_values="min", method_outlier="abs", ) pruned_skeleton = pruned._prune_by_length(pruned.skeleton, pruned.max_length) fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) ax1.imshow(img_skeleton["original"]) ax1.set_title("Original mask") ax2.imshow(img_skeleton["skeleton"]) ax2.set_title("Skeleton") ax3.imshow(img_skeleton["img"]) ax3.set_title("Gaussian Blurring") ax4.imshow(pruned_skeleton) ax4.set_title("Pruned Skeleton") plt.show() def pruning_skeleton() -> dict: """Smaller skeleton for testing parameters of prune_all_skeletons().""" return _generate_random_skeleton(rng=69432138, min_size=15, image_shape=(30, 30)) pruned_skeleton(pruning_skeleton()) ``` The paramterised test with `id="Length pruning enabled (10), removes two ends leaving nibs on both."` includes the resulting array after pruning and there is a two-prunged fork of nibs when using `convPrune()`. I think it would be useful if `remove_nibs()` were able to handle such T-junctions as well.
From @MaxGamill-Sheffield
In light of this I will undertake the following...
|
+ Remove convPrune class. + Removes static methods, this makes intermediary steps/data available and avoids creating and passing around objects as they are instead class attributes. + Remove `remove_bridges()` method of `heightPrune` class as its identical to `height_prune()` method. + Splits the splitting and labelling of branches into a new static method (test yet to be written). + Remove associated/redundant unit-tests for removed classes and methods.
c4fd2f5
to
119b0d0
Compare
@MaxGamill-Sheffield Its not just renaming of variables that is causing the current test suite to fail, after doing some digging and scratching my head a lot, working out that we now need to pass all of the new options to I've therefore marked a bunch of tests in Its compounded by the hassle of having to take a list of co-ordinates and revert them to a 2D array to plot (I've left a note in 119b0d0 with some hacky code of how to do this as much for my own reference when I return to this later next week). If you have any time/capacity to start investigating this it would be really useful. I was under the mistaken impression that you might have been using the existing test suite as you were refactoring and ensured that these passed or were updated as required (that is one of the major benefits of tests) but that isn't the case so we'll have to work through this now. |
Work on getting the test suite to work with the refactored code. Long way off working at the moment though. + Pass `default_config.yaml` into the `dnatrace_linear`/`dnatrace_circular` fixtures so we have all options in place. + Skip a bunch of tests that are contingent on the pruning having been done correctly, this will require tweaking of the `skeletonise_params` so that correctly pruned skeletons are returned. + `self.grain` > `self.mask` I found it confusing that the original image after smoothing was called `self.smoothed_grain` when it was heights and the mask was called `self.grain`. By having the mask named as such it avoids confusion and makes the attributes clearer. A major hindrance to debugging this is when the skeleton is converted from an array to a list of co-ordinates that define the location of points. This makes it a bit of a pain to then plot and see what we have can be achieved with the following messy code... ``` original_skeleton_coords = np.asarray([[28, 47], [ 28, 48], [ 29, 46], [ 29, 48], [ 30, 47], ]) skeleton = np.zeros((original_skeleton_coords.max() + 2, original_skeleton_coords.max() + 2)) skeleton[original_skeleton_coords[:,0], original_skeleton_coords[:,1]] = 1 plt.imshow(skeleton) plt.show() ```
119b0d0
to
c1ebb61
Compare
I've been looking at why many of the current set of tests in Passing two sets of heightsWe've already identified and corrected a problem with I've found the use of Pruning not as expectedThe This function in the current refactored code calls in order... def get_disordered_trace(self):
"""
Derive the disordered trace coordinates from the binary mask and image via skeletonisation and pruning.
"""
self.skeleton = getSkeleton(
self.smoothed_grain,
self.mask,
method=self.skeletonisation_params["method"],
height_bias=self.skeletonisation_params["height_bias"],
).get_skeleton()
self.pruned_skeleton = prune_skeleton(self.smoothed_grain, self.skeleton, **self.pruning_params.copy())
self.pruned_skeleton = self.remove_touching_edge(self.pruned_skeleton)
self.disordered_trace = np.argwhere(self.pruned_skeleton == 1) There are eight parameterised tests as there are four skeletonisation methods ( To investigate I've plotted the returned skeleton after pruning from the existing code on NB - In light of the artifacts in the images on this branch not being clearly binary (yellow) plots I've double checked that the skeletons returned on this branch are in fact binary arrays and they are. The newly introduced I suspect some of these parameters though are different from the hard-coded method that is used on The branches on the skeletons from Repeated PruningLooking at the This method (see here) repeatedly calls the In the current refactored code this repeated pruning doesn't appear to occur. Skeletonisation and pruning have been separated out and within
Perhaps the reason we're not seeing the same skeletons is because only a single round of pruning is being undertaken? 🤔 |
Looked at this more this morning and have found that in the current refactored when However, within this loop the In contrast the I think this might be the reason some of the branches are not pruned when running the test suite on the refactored code (branch I shall be investigating further today. |
Currently on In
This isn't the behaviour in the current
Not sure why the length is one coordinate longer to start with 🫤 I can understand why the length might be reassessed after pruning so am unsure if this change is deliberate @MaxGamill-Sheffield However, we still have not pruned all of the branches that the test expected to be pruned (perhaps as we only have three iterations still). The logic for controlling |
Looking deeper into this I've found that the initial skeletonisation using the TopoStats method is perhaps the underlying cause of problems we are seeing with pruning. NB is a branch off of
Okay this is strange, at the very least I would expect the Zhang/Lee/Thin methods to return the exact same initial skeletons, perhaps the masks that are being passed in are different.
This shows that the masks that the refactored code are passing in to each of the skeletonisation methods are different and would go some way to explaining why we are seeing different disordered traces which are the result of Skeletonisation and Pruning^[1]. Now it is worth noting the discovery above that found that a Gaussian smoothed image of heights was being passed into Currently this has been switched to On the
...and Thus we need to apply Somewhat confusingly there is the
This method is applied to the Another and perhaps simpler approach might be to have ^[1] : On the |
Having now tested the above and ensured that Linear
|
Looking in detail at the final skeletons that are produced by the refactored code now that the skeletonisation method is passed a binary dilated mask and comparing these to the skeletons returned on the The
We can see that for the This is an improvement and I would be happy to update the tests in light of these changes as from the For the Would be useful to hear your thoughts on what might underpin these changes @MaxGamill-Sheffield and whether what is currently returned is acceptable before I dig deeper. I do appreciate you undertook this work some time ago. I'm going to go through my current branch and remove reference to |
Sorry for the late reply here are my comments:
|
Thanks for having a look @MaxGamill-Sheffield The existing test suite which is where a bunch of tests all fail and I'm currently investigating break down the processing steps and test individually and sequentially (so we can identify at what point breakages occur when undertaking refactoring)... dnatrace.gaussian_filter()
dnatrace.get_disordered_trace()
<optional additional steps if required> ...as that was how the workflow ran when the tests were written and because It's fine to have introduced
That might be true in your refactoring of The unit-tests are failing at
I've not got to checking the tests for this yet but looking at the above mentioned PR the
...(both on We can likely therefore do away with the line...
...as it is only used in the I'll work on tidying this up and removing redundant code and getting the tests back on track. |
Work on getting the test suite to work with the refactored code. Long way off working at the moment though. + Pass `default_config.yaml` into the `dnatrace_linear`/`dnatrace_circular` fixtures so we have all options in place. + Skip a bunch of tests that are contingent on the pruning having been done correctly, this will require tweaking of the `skeletonise_params` so that correctly pruned skeletons are returned. + `self.grain` > `self.mask` I found it confusing that the original image after smoothing was called `self.smoothed_grain` when it was heights and the mask was called `self.grain`. By having the mask named as such it avoids confusion and makes the attributes clearer. A major hindrance to debugging this is when the skeleton is converted from an array to a list of co-ordinates that define the location of points. This makes it a bit of a pain to then plot and see what we have can be achieved with the following messy code... ``` original_skeleton_coords = np.asarray([[28, 47], [ 28, 48], [ 29, 46], [ 29, 48], [ 30, 47], ]) skeleton = np.zeros((original_skeleton_coords.max() + 2, original_skeleton_coords.max() + 2)) skeleton[original_skeleton_coords[:,0], original_skeleton_coords[:,1]] = 1 plt.imshow(skeleton) plt.show() ```
The `dnaTrace.gaussian_filter()` method is obsolete and has been replaced by `dnaTrace.smooth_mask()`. The obsolete function has therefore been removed and the related test adapted and expanded to test `test_smooth_mask()` with parameters that vary the number of `dilation_iterations` and the value for `gaussian_sigma`. The new `dnaTrace.smooth_mask()` performs binary dilation of the mask and compares this to a gaussian blur of the mask. Whichever results in the smallest difference compared to the original mask then has "holes readded" and the `self.smoothed_mask` is updated. It is important to note that this differs from previous methods where the Gaussian filter was applied to the original images of heights. Why this switch was made is unclear. Additional work + Refactoring to remove mention of `grains` as the term is used in multiple places, sometimes with the original height array and sometimes with the mask array. This ambiguity can lead to confusion instead the terms `image` and `mask` are used.
…-tests-pruning-topostats-conv
Converting use of `grain` to `mask` in `nodestats` to align with nomenclature in `dnatracing` Its more consistent and descriptive of what the arrays are in terms of NumPy parlance (they are all binary masks) and avoids confusion where at times in the variables labelled `grain` historically contained height values and more recently it contains masked data that is binary.
X-post from slack: My guess for why the new smooth_mask() function no longer operates as expected and gives different results is because the hardcoded sigma value (! not deffine at init but in smooth_grains()!) is too small due to the new input of a grain with a shape smaller than the older input in the cats branch which was an image, and thus nothing is smoothed. |
Converting use of `grain` to `mask` in `nodestats` to align with nomenclature in `dnatracing` Its more consistent and descriptive of what the arrays are in terms of NumPy parlance (they are all binary masks) and avoids confusion where at times in the variables labelled `grain` historically contained height values and more recently it contains masked data that is binary. Improves type hints for numpy arrays.
Updates the test to reflect the changes introduced by switching from applying `guassian_filter()` to the original image heights to the `smooth_mask()` which uses either binary dilation or Gaussian filtering applied to the binary mask. Tests updated + `test_get_disordered_trace()`
This method is no longer used so it has been removed along with tests.
This reverts commit 594a3f1d739b12ca0654e36ee41355d532d95430.
This reverts commit cd2e3f6.
Commits reverted so |
Happy for this to be merged, I just fixed a few little bits that prevented the workflow from working:
I have not updated the dnatracing tests as I thought this would be more relevant in the dnatracing PR, not this focused Prune / skeletonisation PR |
Thanks, the I must work harder on keeping my commits atomic! |
4d5bad0
into
maxgamill-sheffield/800-better-tracing
Adds a smaller (30x30) test skeleton for testing pruning, this makes it easier to work with, however, this does raise a question as the resulting skeleton, generated by
skimage.morphology.skeletonize()
does have a T-shaped junction which it has been suggested never arise (more on this below).Tests for
TopoStatsPrune
andconvPrune
attempt to demonstrate the effect of varying parameters has on pruning to which end the following parameters are varied...max_length
height_threshold
method_values
method_outliers
There are some issues identified so far...
convPrune
As noted in the
reason
for the first paramterised test failing which arises whenmax_height = None
(sinceint
can not be compared toNone
) I think there is a problem with the default height being used here.Line 454 tries to set the default
max_branch_length
to be 15% of the total number of elements in the skeleton array.self.skeleton.size
counts the number of elements in an array (and because we are working with 2D images this will always be the product of the number of rows and columns). But the length of this will always be1
since the total number of elements in the skeleton will always be an integer. Thus ifmax_length == -1
then themax_branch_length
will always be 0 since we would always only ever get the number of elements in the array. That said in this dummy example we don't even get that, instead we get an error raisedIts also unclear why the
convPrune._prune_by_length()
method takes the argumentmax_length
when its an attribute of the class which could be used instead.This doesn't happen in the
topostatsPrune
class because the length of the molecule is based on the co-ordinates of the skeleton rather than the size of the array.Excessive Complexity/Code duplication
The splitting of functionality here into multiple classes means there is some code duplication. Both
topostatsPrune
andconvPrune
have methods to_prune_by_length()
conditionally based on the value ofself.max_length
not beingNone
but they do so in different manners. Both methods then go on to calledheightPruning()
in the same manner.The two classes could be combined into one with an option of whether to use the original length based pruning or the convolutional approach and call the appropriate method (renaming the method from
convPrune
toprune_length_by_conv()
and the one fromtopostatsPrune
toprune_length_by_neighbours()
. In turn this would mean that there is no need to have thepruneSkeleton()
factory class (and having that as a class in itself seems like overkill when a series of functions would suffice and address the complaints from Pylint ontoo-few-public-methods
too).Handling multiple grains
The refactoring done previously in #600 removed the loops from every method/function in the tracing module so that it only handled a single grain. The looping over of multiple grains is handled by code in
process_scan.py
, this means theprune_all_skeletons()
methods can be removed too, further simplifying the code base.These will be simple to remove in due course once we have robust tests in place.
T-shaped junctions
Previous work in #835 originally tested whether
pruning.rm_nibs()
"Remove single pixel branches (nibs) not identified by nearest neighbour algorithsm as there may be >2 neighbours" and the current tests show that these are still left by the pruning methods being tested, even though the last step is to re-run the plain skeletonisation method fromskimage
(Zhang's).The following is a reproducible example that shows such a nib remains after pruning
The paramterised test with
id="Length pruning enabled (10), removes two ends leaving nibs on both."
includes the resulting array after pruning and there is a two-pronged fork of nibs when usingconvPrune()
.I think we need to ensure that
remove_nibs()
can handle such T-junctions as well as this dummy example demonstrates they are not correctly removed/pruned.