Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistical analyses #16

Open
mathieuboudreau opened this issue Nov 23, 2020 · 8 comments
Open

Statistical analyses #16

mathieuboudreau opened this issue Nov 23, 2020 · 8 comments
Assignees

Comments

@mathieuboudreau
Copy link
Collaborator

This issue originated from a meeting between me, @agahkarakuzu, @matteomancini, and @stikov, which we're moving here so that anyone can get involved with the discussion.

The idea is to identify if the data collected by the challenge may be open for some potentially interesting statistical analysis, and to discuss how to best implement these (and in particular, using open-source tools). Also, we could also maybe identify some statistical analyses that would be interesting but that we don't have sufficient data for, and leave that for an open challenge for people to collect more data for.

I think we should start by describing the datasets we have at hand and some of the remaining corrections or post-processing steps that should be done, and then explore some statistical analysis ideas that are well suited for this dataset and doesn't overlap with other similar studies (such as Banes et al. 2017 that used the NIST phantom on multisites but with much stricter protocol implementation rules that our current challenge, which was to investigate the differences or robustness against cross-site implementations). We also have some human datasets to compare with, which could also be explored (human<->human and/or NIST<->human).

@agahkarakuzu agahkarakuzu changed the title Statistical analysises Statistical analyses Nov 23, 2020
@mathieuboudreau
Copy link
Collaborator Author

@agahkarakuzu until @matteomancini accepts the invitation to the repo, would you mind writing down some of your thoughts (and those that were discussed at our initial meeting)? Just everything that you remember or comes to mind would be fine.

@matteomancini
Copy link

I think that the structure of the data (T1 values estimated across sites and scanners, with several additional details available) would be well suited from a mixed/fixed (depending on the hypothesis) effects linear model.
Depending on how many aspects we want to take into account, we would formulate the fundamental model as:
measuredT1 ~ groundtruthT1 + scannerModel + (1|researchSite)
In this case (which is one of the possible implementations, in an R-esque syntax), scannerModel is a fixed factor (e.g. we expect that influences the measuredT1 outcome) and researchSite is a random one (e.g. grouping factor we want to take into account). This kind of framework would allow to do several considerations: see how much the goodness of fit changes when going from e.g. taking into account just the measured and ground-truth T1 values; study interactions; etc.
A not-so-serious example to see how it works out in the wild (it's R, but there ready-to-use tools also in MATLAB and Python):
https://ourcodingclub.github.io/tutorials/mixed-models/

@mathieuboudreau
Copy link
Collaborator Author

mathieuboudreau commented Jan 6, 2021

Summary of what we have so far:

NIST

  • Registered & labelled ROIs.
  • Database with site info, temp, acquisition info, ROI voxels, etc
  • Many of the datasets in the database are duplicate data Magnitude + Complex
  • DIfferent scanners
  • 2 phantoms
  • Not temperature corrected (yet)
  • Some of the datasets are the same phantom but different sites
  • Some of the datasets was acquired at different sites with different protocols
  • Some of the datasets was acquired at different sites but with same protocol
  • Some sites acquired multiple acquisitions of the same phantom, but different protocols/times (e.g. see below. Scan-rescan, 4 point vs 14 point, short TR/Long TR, different scanners, etc)
  • A few outliers that may need to be either cleaned (label ROIs), corrected (Philips), or removed.

Human

  • Labelled manual ROIS (not registered)
  • Database with site info, temp, acquisition info ROI voxels, etc
  • Many of the datasets in the database are duplicate data Magnitude + Complex
  • DIfferent scanners
  • Some sites acquired multiple acquisitions of the same phantom, but different protocols/times (e.g. see below. Large multi-subject GE vs philips, 20 channel vs 64 channel, more?)

Interesting datasets combinations, but maybe not enough time to analyse yet

  • Intra- and inter-vendor analyses
  • (NIST) Phillip’s large multi-site NST dataset
  • (NIST) mrel_usc
    • Day1/Day2 scan-rescan
    • Same day, two MR’s
    • Short TR long TR
  • (NIST) niloufar_hfmc
    • 4 point vs 14 point
  • (NIST) wang_MDanderson
    • Day1/Day2 scan-rescan
  • (NIST) Ngmaforo_ucla
    • Prisma vs Skyra
  • (Human) mrel_usc
    • 6 subjects
  • (Human) jorgejovicich_cimec
    • 20 channel vs 64 channels
  • (Human) luisconcha_UNAM
    • Large multi-subject GE vs Philips datasets

@mathieuboudreau
Copy link
Collaborator Author

Language to use

Very likey R + RShiny for visualisations

@mathieuboudreau
Copy link
Collaborator Author

mathieuboudreau commented Jan 6, 2021

Statistical analyses proposals

  • Question 1: Compare all the scans from Philips germany (same phantom, same scanner, copied protocols) and compare them with all the scans from the Montreal sites (same phantom, variable protocol implementations & scanners)
  • Question 2: Take 1 scan from each submission (or even site, maybe) and compare them together to determine if T1 values for each sphere agrees with reference pretty decently.
    • And/or compare the worse scans from each site for fairness
  • Question 3: Dependency of systematic deviations on reference T1 values (i.e. is there a common distribution pattern across all the combinations when we plot them per scan).
    • Investigate if simulations based on the protocol used can explain the shared variance observed across T1s vs the ground truth.

To do

  • Reformulate above questions into well-written statistical hypotheses.
  • Propose how to analyse them in R (what tool, method, etc)

@mathieuboudreau
Copy link
Collaborator Author

I think that the structure of the data (T1 values estimated across sites and scanners, with several additional details available) would be well suited from a mixed/fixed (depending on the hypothesis) effects linear model. Depending on how many aspects we want to take into account, we would formulate the fundamental model as: measuredT1 ~ groundtruthT1 + scannerModel + (1|researchSite) In this case (which is one of the possible implementations, in an R-esque syntax), scannerModel is a fixed factor (e.g. we expect that influences the measuredT1 outcome) and researchSite is a random one (e.g. grouping factor we want to take into account). This kind of framework would allow to do several considerations: see how much the goodness of fit changes when going from e.g. taking into account just the measured and ground-truth T1 values; study interactions; etc. A not-so-serious example to see how it works out in the wild (it's R, but there ready-to-use tools also in MATLAB and Python): https://ourcodingclub.github.io/tutorials/mixed-models/

@agahkarakuzu an idea for you to maybe replace Figure 7 with, or to supplement Figure 7. This is something that Juan didn't feel comfortable doing (nor I), but since you have more stats experience it could be easy to do with the pre-saved pandas ROI & config details databases I have in this repo.

@agahkarakuzu
Copy link
Collaborator

Beyond the comfort in implementing this, I think the main issue is whether we need this or what it adds to our analysis.

Ground truth T1 is a direct determinator of the T1 measured using the gold standard IR (strongly autocorrelated), so they should not be on the opposite sides of a linear model. t1_deviation ~ scanner + site would be an alternative. Is there any data that can extend this beyond scanner and site to say us something grouped plots cannot given the limited number of samples?

@mathieuboudreau
Copy link
Collaborator Author

Beyond the comfort in implementing this, I think the main issue is whether we need this or what it adds to our analysis.

So, when considering this question, I think there are 4 main things to consider that I'm aware of currently:

  1. Nikola had it in mind that it was something he'd like to see done with this dataset when we initially met about doing statistical analyses a few years ago with you and Matteo.
  2. A co-author asked if we had something similar to this during the review of the ISMRM abstract.

Screenshot 2023-02-16 at 11 07 12 AM

3. [Bane et al. 2018](https://onlinelibrary.wiley.com/doi/10.1002/mrm.26903)'s multicenter standard phantom study on T1 mapping used a general linear mixed model to asses factors influencing the accuracy of the measurements (looking at scanner/vendor/protocols as potential factors). [Keenan's 2021 multi-center phantom study](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0252966) also looked into if manufacturers were a predictor using ANOVAs. 4. We can easily anticipate that the reviewers to ask for this during the first round of reviews.

Is there any data that can extend this beyond scanner and site to say us something grouped plots cannot given the limited number of samples?

Phantom version could be another (hypothesizing that there might actually be difference between phantoms; the other two studies mentioned above used a single phantom). Maybe also "submitter"/"implementor" of the protocol (since some phantoms were shared between submissions. One thing I didn't collect in the JSON but may be present in the DICOMS was pre-scan settings, which as you know, would likely be a signifiant factor if the wrong settings are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants