-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to --tedpca option #1066
base: main
Are you sure you want to change the base?
Conversation
Changes to facilitate the selection of all non-void PCA components by requesting 100% of variance.
Thanks @Lestropie. No problem. I will check the behaviour when |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1066 +/- ##
==========================================
+ Coverage 89.76% 89.80% +0.04%
==========================================
Files 26 26
Lines 3506 3520 +14
Branches 620 621 +1
==========================================
+ Hits 3147 3161 +14
Misses 211 211
Partials 148 148 ☔ View full report in Codecov by Sentry. |
The code looks fine to me. I don't love that My only code critique is that Once @BahmanTahayori confirms this works on your data, make it ready for review and I'll be fine approving. |
I agree that the One aspect I've thought about in this regard is having the user feedback on what the PCA is doing be a bit more upfront. I don't get my hands dirty with TEDANA myself---I'm rather just supporting @BahmanTahayori's changes---so am not familiar with the user feedback, but in my own software, wherever I see a reasonable risk of a user input being interpreted in a manner different to what they intended, I'd escalate it to a console terminal notification. This would reduce the risk of such an error being lost in lengthy log files that not all users would diligently check. Regarding
, my code is already doing just the opposite of this: if it's already an integer or a float, then it does the native checking of the value, otherwise it proceeds as though it is a string. Either way, the relevant
, I think that should already be satisfied? There's just a subtle difference in that the string->float and string->integer conversions are done separately, to avoid one of the prior ambiguous behaviours. |
Every time I look at the code, I think of ways to do things better, but I think this PR is fine. Otherwise, if @BahmanTahayori is fine with how this is running on your data, mark it as "Ready for review" and I'll approve. |
The PR as it currently stands, as it turns out, does not provide the desired result. Error message as reported by @BahmanTahayori:
So the downstream requirement is specifically (0.0, 1.0) in being non-inclusive of 1.0. We would nevertheless like for there to be a convenient mechanism by which to instruct the PCA to retain all components. Also, I had misunderstood what was happening in the TEDANA PCA when a prior PCA denoising step had already been applied: the later components can be very close to zero power, but aren't actually exactly zero. So my desire to "select all components for which the variance explained is non-zero" doesn't actually make sense. Here are the possible alternatives that have come to mind (not necessarily mutually exclusive):
I believe that my own preference would be:
|
I don't think your idea for My understanding is the whole reason for this proposed change is, if there are 200 volumes, after denoising, it should be possible to model 100% of the variance using only X PCA components where X<<200. As such, to replicate the functionality you're going for here, your option 1 would be the best approach, but I'm not sure we need a special case to convert Thoughts? |
Splitting PR intended software changes from intended scientific usage of such:
|
Have had a bit more discussion with @BahmanTahayori today. I think we've got a plan for how to progress.
|
I'm a bit confused on wanting |
The upstream denoising step has a pretty drastic effect on the PCA decomposition. So it's quite probable that there's a concomitant effect on ICA efficacy with large rank. Can have a more informed discussion once @BahmanTahayori presents some relevant data. |
I have analysed over 240 subjects from our centre with MPPCA applied beforehand and when |
Partial reversion of 55aee07 as discussed in ME-ICA#1066. These changes forbid the request to preserve 100% of variance by specifying "--tedpca 1.0"; it is intended to in the future facilitate this sort of operation through other means.
209a01d
to
e81badf
Compare
On looking at the code again, I decided to refine this PR rather than creating a new one. The intent is nevertheless consistent with point 1 from comment above: to fix inconsistencies between documentation and implementation, and to ensure that |
Partial reversion of 55aee07 in that requesting a floating-point value of 1.0, being 100% of variance, is no longer permitted.
PR is complete and ready for review. |
Closes #956.
Related to #1013.
@BahmanTahayori I hijacked your fork rather than creating another one; hope you don't mind.
The primary goal here is to permit the use of "
--tedpca 1.0
" to preclude the PCA from removing any components, rather than using the unclean and potentially threshold-sensitive eg. "--tecpca 0.999
". I tried to fix up a few related things as I was doing that.The (now deleted) line:
if floatarg != int(floatarg):
yields
False
for input"1.0"
, which resulted in interpretation as an integer despite very explicitly being a float.Documentation and code was inconsistent regarding whether an integer of 1 was or was not a permissible value. I've tried to make selection of a single component permissible; it's hard to justify permitting 2 but not 1. While there's the issue with interpretation of
"1"
vs."1.0"
, I'm hoping that the tweaks to help page and docstrings will help convey the distinction. I didn't want to make the help string even longer than it already is, so focused instead on upfront assertion of different interpretation based on type.The input to function
check_tedpca_value()
was called "string
", even though that function is (and is clearly designed to be) invoked outside of a command line parsing context where the type of that argument may not be astr
.To progress from draft:
--tedpca 1.0
is specifiedIdeally what I think should happen is that all non-empty components should be selected. @BahmanTahayori: you indicated that with an upstream PCA denoising, TEDANA's PCA can yield zero-filled components. Can you confirm with real data that the behaviour here will be as intended; ie. those are excluded rather than "100% of variance" being interpreted as "keep every single component regardless of content"? I can sit down with you and refine code if it turns out to be necessary.