MSBooster

Last updated: 9/30/2024

Overview

MSBooster is a tool for incorporating spectral libary predictions into peptide-spectrum match (PSM) rescoring in bottom-up tandem liquid chromatography mass spectrometry proteomics data. It is roughly broken into 4 steps:

Peptide extraction from PSMs in search results, and formatting for machine/deep learning (ML/DL) predictors' input files
Calling the prediction model(s) and saving the output
Feature calculation
Addition of new features to the search results file

MSBooster is compatible with many types of database searches, including HLA immunopeptidomics, DDA and DIA, and single cell proteomics. It is incorporated into FragPipe and is included in many of its workflows. MSBooster was developed with other FragPipe tools in mind, such as FragPipe-PDV.

Accepted inputs and models

MSBooster is equipped to handle multiple input file formats and models:

Mass spectrometer output
.mzML
.mgf

PSM file
.pin
.pepXML (in progress)

Prediction model
DIA-NN
Koina models

Installation and running guide

In FragPipe

MSBooster can be run in Windows and Linux systems. If using FragPipe, no other installation steps are needed besides installing FragPipe. MSBooster is located in the "Validation" tab. Choose to enable retention time features with "Predict RT" and MS/MS spectral features with "Predict spectra". Please refer to the FragPipe documentation for how to run an analysis.

On the command line

If using standalone MSBooster to run in the command line, please download the latest jar file from Releases. MSBooster also requires DIA-NN for MS/MS and RT prediction. Please install DIA-NN and take note of the path to the DIA-NN executable (ex. DiaNN.exe for Windows, diann-1.8.1.8 for Linux).

You can run MSBooster using a command similar to the following:

java -jar MSBooster-1.2.1.jar --paramsList msbooster_params.txt

The minimum parameters needing to be passed are:

- DiaNN (String): path to DIA-NN executable (if using DIA-NN model, which is the MSBooster default)
- mzmlDirectory (String): path to mzML/mgf files. Accepts multiple space-separated folder and files
- pinPepXMLDirectory (String): path to pin files. Accepts multiple space-separated folder and files.
  If using in FragPipe, place the pin and pepXML files in the same folder

While you can individually pass these parameters, it is easier to place one on each line of the paramsList file. Please refer to msbooster_params.txt for a template.

Optional parameters

The parameters below are for general use. Koina-specific parameters are in the Koina documentation

General input/output and processing

paramsList (String): location to text file containing parameters for this run
fragger (String): file path of fragger.params file from the MSFragger run. MSBooster will read in multiple parameters and adjust internal parameters based on them, such as fragment mass error tolerance and mass offsets
outputDirectory (String): where to output the new files
editedPin (String): MSBooster will name the new file based on the ones provided. For example, A.pin will have a counterpart called A_edited.pin. To change from the default of "edited", provide a new string here
renamePin (int): whether to generate a new pin file or rewrite the old one. Default here is 1, which will not overwrite. Setting this to 0 will overwrite the old pin file
deletePreds (boolean): whether to delete the files storing model predictions after finishing a succesful run. By default, set to false. Set to true if you wish to delete these
loadingPercent (int): how often to report progress on tasks using a progress reporter. By default, set to 10, meaning an update will be printed every 10%.
numThreads (int): number of threads to use. By default set to 0, which uses all available threads minus 1
splitPredInputFile (int): only used when DIA-NN predictions fail due to an out of memory error (137). By default, set to 1, but you can increase this to specify how many smaller files the DIA-NN input file should be broken up into. Each file will then be predicted sequentially, easy the memory burden
plotExtension (String): what file format plots should be in. png by default, and pdf is also allowed
features (String): list of features to be calculated. Case-sensitive, comm-separated without spaces in between. Default is "predRTrealUnits,unweightedSpectralEntropy,deltaRTLOESS"

Enabling, specifying, and loading predictions

spectraPredFile (String): if you are reusing old spectral predictions (e.g. from DIA-NN or Koina), you can specify the file location here
RTPredFile (String): same as spectraPredFile, but for RT predictions
IMPredFile (String): same as spectraPredFile, but for IM predictions
spectraModel (String): which spectral prediction model to use
rtModel (String): same as spectraModel, but for RT
imModel (String): same as spectraModel, but for IM
useSpectra (boolean): whether to use spectral prediction-based features. Set to true by default
useRT (boolean): whether to use RT prediction-based features. Set to true by default
useIM (boolean): whether to use IM prediction-based features. Set to false by default

MS/MS spectral processing

ppmTolerance (float): fragment error ppm tolerance (default 20ppm)
matchWithDaltons (boolean): whether to match predicted and observed fragments in Daltons (default false)
DaTolerance (float): how many daltons around the predicted peak to look for experimental peak (default 0.05)
useTopFragments (boolean): whether to filter spectral prediction to the N highest intensity peaks (default true)
topFragments (int): up to how many predicted fragments should be used for feature calculation (default 20). Only applied if useTopFragments is true
removeRankPeaks (boolean): Set to true by default, which filters out fragments from the experimental spectra once matched. If false, experimental fragments can be matched by multiple PSMs from the same scan
useBasePeak (boolean): whether a lower limit should be applied to MS2 predictions to only use fragments with higher intensity (default true)
percentBasePeak (float): percent at which fragment with intensity of some percent of base peak intensity is included in similarity calculation. Only applied if useBasePeak is true (default 1)

RT/IM prediction

loessEscoreCutoff (float): expectation value cutoff used for first pass at collecting PSMs for RT/IM calibration. Default is 10^-3.5, or approximately 0.000316
rtLoessRegressionSize (int): maximum number of PSMs used for RT LOESS calibration (default 5000)
imLoessRegressionSize (int): same as rtLoessRegressionSize but for IM (default 1000)
minLoessRegressionSize (int): minimum number of PSMs needed to attempt LOESS RT/IM calibration (default 100). If fewer than this number of PSMs are available, linear regression is used instead
minLinearRegressionSize (int): minimum number of PSMs needed to attempt linear regression RT/IM calibration (default 10). If fewer than this number of PSMs are available, no calibration is attempted
loessBandwidth (String): list of bandwidths to try for RT/IM LOESS calibration (default 0.01,0.05,0.1,0.2). This must be comma-separated with no spaces in between
regressionSplits (int): number of cross validations used for RT/IM LOESS calibration (default 5)
massesForLoessCalibration (String): masses for mass shifts that should be fit to their own calibration curves. List is comma-separated with no spaces in between. The masses should be written to the same number of digits as in the PIN file
loessScatterOpacity (float): opacity of scatter plots in LOESS calibration figures, from 0 to 1 (default 0.35)

Output files

.pin file with new features. By default, new pin files will be produced ending in "_edited.pin". The default features used are "unweighted_spectral_entropy", "delta_RT_loess", and "pred_RT_real_units". If ion mobility features are enabled, "delta_IM_loess" and "ion_mobility" will also be included
spectraRT.tsv and spectraRT_full.tsv: input files for DIA-NN prediction model
spectraRT.predicted.bin: a binary file with predictions from DIA-NN to be used by MSBooster for feature calculation. If using FragPipe-PDV, these files are used to generate mirror plots of experimental and predicted spectra

Graphical output files

MSBooster produces multiple graphs that can be used to further examine how your data compares to model predictions.

MSBooster_plots folder:
- RT_calibration_curves: up to the top 5000 PSMs will be used for calibration between the experimental and predicted RT scales. These top PSMs are presented in the graph, not all PSMs. One graph will be produced per pin file
- IM_calibration_curves: up to the top 1000 PSMs will be used for calibration between the experimental and predicted IM scales. These top PSMs are presented in the graph, not all PSMs. A separate curve will be learned for each charge state. The figure below is an example for charge 2 precursors
- score_histograms: overlayed histograms of all target and decoy PSMs for each pin file. Some features are plotted here on a log scale for better visualization of the bimodal distribution of true and false positives, but the original value is what is used in the pin files, not the log-scaled version. Shown here are histograms for the unweighted spectral entropy and delta RT scores, but similar ones are produced for all features

Tutorials

Use peptide prediction models from Koina for MSBooster feature generation: https://fragpipe.nesvilab.org/docs/tutorial_koina.html
Reading in predictions from any model via MGF files

TODO

Documentation on all allowed features and how to QC them with graphical output

How to cite

Please cite the following when using MSBooster: https://www.nature.com/articles/s41467-023-40129-9

Name		Name	Last commit message	Last commit date
Latest commit History 356 Commits
Koina manuscript resource		Koina manuscript resource
README_imgs		README_imgs
lib		lib
src/main		src/main
.gitignore		.gitignore
Koina.md		Koina.md
LICENSE		LICENSE
README.md		README.md
ReadMgfPredictions.md		ReadMgfPredictions.md
example_before_and_after_files.zip		example_before_and_after_files.zip
msbooster_params.txt		msbooster_params.txt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSBooster

Overview

Accepted inputs and models

Installation and running guide

In FragPipe

On the command line

Optional parameters

Output files

Graphical output files

Tutorials

TODO

How to cite

About

Releases

Packages

Contributors 3

Languages

License

Nesvilab/MSBooster

Folders and files

Latest commit

History

Repository files navigation

MSBooster

Overview

Accepted inputs and models

Installation and running guide

In FragPipe

On the command line

Optional parameters

Output files

Graphical output files

Tutorials

TODO

How to cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages