Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve linear splining #653

Merged
merged 31 commits into from
Oct 24, 2023
Merged

Conversation

SylviaWhittle
Copy link
Collaborator

@SylviaWhittle SylviaWhittle commented Aug 7, 2023

Closes #607

This PR aims to improve the splining of linear molecules. Previously we just skipped splining for linear molecules, however @Jean-Du got it working through use of the smoothing and periodicity parameters to scipy's splprep (splining) function.

I wanted to make get_splined_traces() into a function rather than a method, since it doesn't really need the self keyword in it and could just as easily take all its arguments as parameters (better for documentation too!) so I've left it in a functional style, but still being a method in case we want to switch it later on.

I've also added unit tests for get_splined_traces() 😄

I've had to update a lot of tests for this change. I have NOT sanity checked all of them, but I have plotted the effects on the splined traces of minicircle.spm:

Previous splines:
image

Updated splines:
image

Thanks to @Jean-Du for the code here, I just copied it across, applied pylint adherence and updated / created tests.

@Jean-Du
Copy link
Collaborator

Jean-Du commented Aug 8, 2023

Closes #607

Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some probably not very well formed thoughts.

I've made a note to ensure I rebase for the branch I'm working on the pruning on.

topostats/tracing/dnatracing.py Outdated Show resolved Hide resolved
topostats/tracing/dnatracing.py Outdated Show resolved Hide resolved
topostats/tracing/dnatracing.py Outdated Show resolved Hide resolved
topostats/tracing/dnatracing.py Outdated Show resolved Hide resolved
@MaxGamill-Sheffield
Copy link
Collaborator

Apologies I've not gotten around to looking at this but hope too soon.

I just want to check if you've managed to solve the oscillating issue causing whacky traces? I'll try to find a test file to see if this still occurs.

@codecov-commenter
Copy link

codecov-commenter commented Aug 21, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (cd0cf4d) 83.53% compared to head (daebcdc) 83.82%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #653      +/-   ##
==========================================
+ Coverage   83.53%   83.82%   +0.28%     
==========================================
  Files          21       21              
  Lines        3061     3072      +11     
==========================================
+ Hits         2557     2575      +18     
+ Misses        504      497       -7     
Files Coverage Δ
topostats/processing.py 93.42% <100.00%> (ø)
topostats/validation.py 100.00% <ø> (ø)
topostats/tracing/dnatracing.py 72.87% <95.00%> (+2.90%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MaxGamill-Sheffield
Copy link
Collaborator

Sorry if this has been waiting on me, I've been testing to see if we see unwanted oscillations in the linear splines like we do in the nodeStats branch, i.e. the following which give incorrect contour lengths:
20230601_Nicked1D_6ng_MgNi_eph 0_00009_above_splines

In the nodestats branch it may be due to the height-biased skeletonisation differences. I'll ignore this issue for now and I've not seen it occur for this branch but just a note for the future the solution may be to just use the ordered traces as these will already be height-aligned.

Copy link
Collaborator

@MaxGamill-Sheffield MaxGamill-Sheffield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good and clean. Thanks for making this change so quick and sorry for the lack of review on this until now.

One thing I want to ask is why some of the circular molecule lengths have changed? My thought is that it shouldn't be touched as only the params (and therefore splining) of the linear mols have changed? The amounts are small enough that it shouldn't be too much of an issue - is it the step_size?.

I also am curious about the shepherd hooks you have for the minicircles image and why despite they don't terminate on or nearer the trace - is it because of the spline averaging where the last coordinate is only obtained for 1 of 4 splines and as such has less weighting? Or am I overcomplicating this hahaha

tests/tracing/test_dnatracing_methods.py Show resolved Hide resolved
tests/tracing/test_dnatracing_methods.py Show resolved Hide resolved
topostats/tracing/dnatracing.py Outdated Show resolved Hide resolved
topostats/tracing/dnatracing.py Outdated Show resolved Hide resolved
topostats/tracing/dnatracing.py Outdated Show resolved Hide resolved
@SylviaWhittle
Copy link
Collaborator Author

Just a note, mainly for future me: The linear molecules look different because previously we just returned the non-splined, fitted traces since we didn't spline the linear traces. We now spline them so they are different.

@SylviaWhittle
Copy link
Collaborator Author

SylviaWhittle commented Oct 16, 2023

I discovered and fixed a bug that caused splining to fail in some cases where there are duplicate, consecutive points in the sampled points arrays. Added a function to remove these duplicates and added tests for it. This results in grains succeeding and hence the change in output columns.

@SylviaWhittle
Copy link
Collaborator Author

Oh and @ns-rse I added a contingency (plus test) to your trace_mask function, where it will skip grains that have pixels outside the bounds of the image. This happens to some splined traces and so to utilise your function I needed to add a contingency to it. (Tested of course).

@SylviaWhittle
Copy link
Collaborator Author

Sorry if this has been waiting on me, I've been testing to see if we see unwanted oscillations in the linear splines like we do in the nodeStats branch, i.e. the following which give incorrect contour lengths: 20230601_Nicked1D_6ng_MgNi_eph 0_00009_above_splines

In the nodestats branch it may be due to the height-biased skeletonisation differences. I'll ignore this issue for now and I've not seen it occur for this branch but just a note for the future the solution may be to just use the ordered traces as these will already be height-aligned.

I don't think this has been fixed, as the actual splining functionality has not changed. This could be a new issue if it needs addressing.

@ns-rse
Copy link
Collaborator

ns-rse commented Oct 17, 2023

@MaxGamill-Sheffield the difference in circular molecule stats is due to a suggested change by @Jean-Du to set the smoothing of circular molecules to 2.0 (previously 0.0). I have reverted it back to 0.0 now just for simplicity. We can always change it later.

I've not digged in to check but if this is a parameter it would make sense for it to be configurable, with a sensible default.

@SylviaWhittle
Copy link
Collaborator Author

@MaxGamill-Sheffield the difference in circular molecule stats is due to a suggested change by @Jean-Du to set the smoothing of circular molecules to 2.0 (previously 0.0). I have reverted it back to 0.0 now just for simplicity. We can always change it later.

I've not digged in to check but if this is a parameter it would make sense for it to be configurable, with a sensible default.

It's in the config now 👍

Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I've taken so long to get round to reviewing this, its a big PR.

Some minor comments but nothing blocking this from being merged.

topostats/tracing/dnatracing.py Outdated Show resolved Hide resolved
tests/test_processing.py Show resolved Hide resolved
tests/tracing/test_dnatracing_methods.py Show resolved Hide resolved
tests/tracing/test_dnatracing_multigrain.py Show resolved Hide resolved
tests/tracing/test_dnatracing_multigrain.py Show resolved Hide resolved
tests/tracing/test_dnatracing_methods.py Show resolved Hide resolved
Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go ✅ 👍

@ns-rse ns-rse dismissed MaxGamill-Sheffield’s stale review October 24, 2023 16:51

No substantive code changes requested, @SylviaWhittle has addressed aspects of changes in length in other comments.

@SylviaWhittle SylviaWhittle added this pull request to the merge queue Oct 24, 2023
Merged via the queue into main with commit 19b2e0a Oct 24, 2023
13 checks passed
@SylviaWhittle SylviaWhittle deleted the SylviaWhittle/607-linear-splining branch October 24, 2023 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Splining Linear Molecules
5 participants