Subtype and Stage Inference, or SuStaIn, is an algorithm for discovery of data-driven groups or "subtypes" in chronic disorders. This repository is the Python implementation of SuStaIn, with the option to describe the subtype progression patterns using either the event-based model, the piecewise linear z-score model or the scored events model.
If you use pySuStaIn, please cite the following core papers:
Please also cite the corresponding progression pattern model you use:
- The piecewise linear z-score model (i.e. ZscoreSustain)
- The event-based model (i.e. MixtureSustain) with Gaussian mixture modelling or kernel density estimation).
- The scored events model (i.e. OrdinalSustain)
Thanks a lot for supporting this project.
Install option 1 (for installing the pySuStaIn code in a chosen directory): clone repository, install locally
-
Navigate to the main pySuStaIn directory (where you see setup.py, README.txt, LICENSE.txt, and all subfolders), then run:
pip install .
Alternatively, you can do
pip install -e .
where the-e
flag allows you to make edits to the code without reinstalling.
Either way, it will install everything listed in requirements.txt
, including the awkde package (used for mixture modelling). During the installation of awkde
, an error may appear, but then the installation should continue and be successful. Note that you need pip
version 18.1+ for this installation to work.
-
Run the following command to directly install pySuStaIn:
pip install git+https://github.com/ucl-pond/pySuStaIn
Note that if you must already have numpy (1.18+) installed to do this. To create a new environment, follow the instructions in the Troubleshooting section below.
If the above install breaks, you may have some interfering packages installed. One way around this would be to create a new Anaconda environment that uses Python 3.7+, then activate it and repeat the installation steps above. To do this, download and install Anaconda/Miniconda, then run:
conda create --name sustain_env python=3.7
conda activate sustain_env
conda install numpy
To create an environment named sustain_env
and install numpy. Then, follow the installation instructions as normal.
- Python >= 3.7
- NumPy >= 1.18
- SciPy
- Matplotlib
- Scikit-learn for cross-validation
- kde_ebm for mixture modelling (KDE and GMM included)
- pathos for parallelization
- awkde for KDE mixture modelling
If you want to check that the installation was successful, you can run the end-to-end tests. For this, you will need to navigate to the tests/
subfolder (wherever pySuStaIn has been installed on your system). Then, you can use the following command to run all SuStaIn variants (this may take a bit of time!):
python validation.py -f
For a quicker run (using just MixtureSustain
), just use:
python validation.py
instead. Testing of single classes is possible using the -c
flag, e.g. python validation.py -c ordinal
. To see all options, run python validation.py --help
.
- Added parallelized startpoints
sustainType can be set to:
mixture_GMM
: SuStaIn with an event-based model progression pattern, with Gaussian mixture modelling of normal/abnormal.mixture_KDE
: SuStaIn with an event-based model progression pattern, with Kernel Density Estimation (KDE) mixture modelling of normal/abnormal.zscore
: SuStaIn with a piecewise linear z-score model progression pattern.
See simrun.py
for examples of how to run these different implementations.
See the jupyter notebook in the notebooks folder for a tutorial on how to use SuStaIn using simulated data. We also have a set of tutorial videos on YouTube, which you can find here.
Methods:
- The SuStaIn algorithm: Young et al. 2018
- The pySuStaIn software paper: Aksman, Wijeratne et al. 2021
- The event-based model: Fonteijn et al. 2012, (with Gaussian mixture modelling Young et al. 2014 or non-parametric kernel density estimation Firth et al. 2020)
- The piecewise linear z-score model: Young et al. 2018
- The scored events model ('Ordinal SuStaIn'): Young et al. 2021
Applications:
- Multiple sclerosis (predicting treatment response): Eshaghi et al. 2021. The trained model is available here.
- Tau PET data in Alzheimer's disease: Vogel et al. 2021
- COPD: Young and Bragman et al. 2020
- Frontotemporal dementia: Young et al. 2021
This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreements 666992. Application of SuStaIn to multiple sclerosis was supported by the International Progressive MS Alliance (IPMSA, award reference number PA-1603-08175).
(The authors) have also persuaded me that (SuStaIn is) as clever as e.g. Heiko Braak's brain, (and) can infer longitudinal trajectories based on cross-sectional observations.
- Anonymous reviewer