DIA-Umpire is an open source Java program for computational analysis of data independent acquisition (DIA) mass spectrometry-based proteomics data. It enables untargeted peptide and protein identification and quantitation using DIA data, and also incorporates targeted extraction to reduce the number of cases of missing quantitation. For more details about the algorithms used and performance evaluation, please refer to the following publications:
- Tsou CC, Avtonomov A, Larsen B, Tucholska M, Choi H, Gingras AC, Nesvizhskii AI, “DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics,” Nature Methods 12:258-64 (2015)
- Tsou CC, Tsai CF, Teo G, Chen YJ, Nesvizhskii AI, “Untargeted, spectral library-free analysis of data independent acquisition proteomics data generated using Orbitrap mass spectrometers“, Proteomics [Epub ahead of print] PubMed PMID: 27246681 (2016).
Overview of DIA-Umpire modules
The overview of the complete workflow in DIA-Umpire is presented in Fig 1. The analysis starts with the signal extraction algorithm to detect all possible precursor and fragment ion features in MS1 and MS2 data, which detects monoisotopic masses and elution profile shapes. Precursor and fragment signals are then grouped based on correlation of their elution profiles (Step A). The tool generates “pseudo MS/MS spectra” (from MS1 features grouped with fragments) for untargeted MS/MS database search to identify peptides and proteins (Step B). Please note currently the analysis of MS/MS database search is not provided by DIA-Umpire, we recommend the Trans Proteomics Pipeline (TPP) for this purpose. An optional step (Step C) allows addition of targeted identifications by taking confidently identified peptides from untargeted MS/MS search and building an internal spectral library. This library (built using all DIA data in the analyzed experiment) is then used for targeted extraction of protein quantitation information from each DIA run, resulting in improved identification/quantitation coverage across all samples. All IDs from either untargeted MS/MS database search or targeted re-extraction are linked to the corresponding precursor-fragment groups which carry quantitative information in the form of precursor and fragment ion intensities. This quantitative information is stored at different levels (fragment → peptide → protein) and is reported in Step D.
Depending on the scale of applications, we describe here different application scenarios which require different combinations of DIA-Umpire modules.
- Identification only analysis (Steps A → B): For users who need protein and peptide identifications and don’t need to have quantitation analysis.
- Small scale identification and quantitation analysis with minimal computational costs (Steps A→B→D): For users who have single or few samples to do identification and quantitation analysis but do not wish to spend computational time on targeted re-extraction step, you can skip the step C and directly go to the quantitation step.
- Complete DIA-Umpire identification and quantitation analysis (Steps A→B→C→D): For larger scale dataset with multiple replicates or samples, we recommend the complete analysis workflow.