Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
Add more filters










Publication year range
1.
J Proteomics ; : 105246, 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38964537

ABSTRACT

The 2023 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers Meeting was held from January 15th to January 20th, 2023, in Congressi Stefano Franscin at Monte Verità in Ticino, Switzerland. The participants were scientists and developers working in computational mass spectrometry (MS), metabolomics, and proteomics. The 5-day program was split between introductory keynote lectures and parallel hackathon sessions focusing on "Artificial Intelligence in proteomics" to stimulate future directions in the MS-driven omics areas. During the latter, the participants developed bioinformatics tools and resources addressing outstanding needs in the community. The hackathons allowed less experienced participants to learn from more advanced computational MS experts and actively contribute to highly relevant research projects. We successfully produced several new tools applicable to the proteomics community by improving data analysis and facilitating future research.

2.
Cell Rep ; 43(6): 114272, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38795348

ABSTRACT

Lysine deacetylase inhibitors (KDACis) are approved drugs for cutaneous T cell lymphoma (CTCL), peripheral T cell lymphoma (PTCL), and multiple myeloma, but many aspects of their cellular mechanism of action (MoA) and substantial toxicity are not well understood. To shed more light on how KDACis elicit cellular responses, we systematically measured dose-dependent changes in acetylation, phosphorylation, and protein expression in response to 21 clinical and pre-clinical KDACis. The resulting 862,000 dose-response curves revealed, for instance, limited cellular specificity of histone deacetylase (HDAC) 1, 2, 3, and 6 inhibitors; strong cross-talk between acetylation and phosphorylation pathways; localization of most drug-responsive acetylation sites to intrinsically disordered regions (IDRs); an underappreciated role of acetylation in protein structure; and a shift in EP300 protein abundance between the cytoplasm and the nucleus. This comprehensive dataset serves as a resource for the investigation of the molecular mechanisms underlying KDACi action in cells and can be interactively explored online in ProteomicsDB.


Subject(s)
Histone Deacetylase Inhibitors , Proteomics , Humans , Histone Deacetylase Inhibitors/pharmacology , Proteomics/methods , Acetylation/drug effects , Phosphorylation/drug effects , Lysine/metabolism , Protein Processing, Post-Translational/drug effects , Cell Line, Tumor , Dose-Response Relationship, Drug , E1A-Associated p300 Protein/metabolism , Histone Deacetylases/metabolism
3.
Methods Mol Biol ; 2758: 457-483, 2024.
Article in English | MEDLINE | ID: mdl-38549030

ABSTRACT

Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.


Subject(s)
Deep Learning , Humans , Chromatography, Liquid , Tandem Mass Spectrometry/methods , Peptides/analysis , Histocompatibility Antigens Class I , HLA Antigens
4.
Mol Syst Biol ; 20(1): 28-55, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38177929

ABSTRACT

Kinase inhibitors (KIs) are important cancer drugs but often feature polypharmacology that is molecularly not understood. This disconnect is particularly apparent in cancer entities such as sarcomas for which the oncogenic drivers are often not clear. To investigate more systematically how the cellular proteotypes of sarcoma cells shape their response to molecularly targeted drugs, we profiled the proteomes and phosphoproteomes of 17 sarcoma cell lines and screened the same against 150 cancer drugs. The resulting 2550 phenotypic profiles revealed distinct drug responses and the cellular activity landscapes derived from deep (phospho)proteomes (9-10,000 proteins and 10-27,000 phosphorylation sites per cell line) enabled several lines of analysis. For instance, connecting the (phospho)proteomic data with drug responses revealed known and novel mechanisms of action (MoAs) of KIs and identified markers of drug sensitivity or resistance. All data is publicly accessible via an interactive web application that enables exploration of this rich molecular resource for a better understanding of active signalling pathways in sarcoma cells, identifying treatment response predictors and revealing novel MoA of clinical KIs.


Subject(s)
Antineoplastic Agents , Sarcoma , Humans , Proteomics/methods , Proteome , Protein Kinase Inhibitors/pharmacology , Protein Kinase Inhibitors/therapeutic use , Sarcoma/drug therapy , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Cell Line, Tumor
5.
Proteomics ; 24(8): e2300112, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37672792

ABSTRACT

Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.


Subject(s)
Proteomics , Software , Proteomics/methods , Peptides , Algorithms
6.
EMBO J ; 42(23): e114665, 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-37916885

ABSTRACT

Substantial efforts are underway to deepen our understanding of human brain morphology, structure, and function using high-resolution imaging as well as high-content molecular profiling technologies. The current work adds to these approaches by providing a comprehensive and quantitative protein expression map of 13 anatomically distinct brain regions covering more than 11,000 proteins. This was enabled by the optimization, characterization, and implementation of a high-sensitivity and high-throughput microflow liquid chromatography timsTOF tandem mass spectrometry system (LC-MS/MS) capable of analyzing more than 2,000 consecutive samples prepared from formalin-fixed paraffin embedded (FFPE) material. Analysis of this proteomic resource highlighted brain region-enriched protein expression patterns and functional protein classes, protein localization differences between brain regions and individual markers for specific areas. To facilitate access to and ease further mining of the data by the scientific community, all data can be explored online in a purpose-built R Shiny app (https://brain-region-atlas.proteomics.ls.tum.de).


Subject(s)
Proteomics , Tandem Mass Spectrometry , Humans , Chromatography, Liquid/methods , Proteomics/methods , Paraffin Embedding/methods , Tandem Mass Spectrometry/methods , Proteins/metabolism , Brain/metabolism , Proteome/metabolism
7.
Nat Commun ; 14(1): 7902, 2023 Nov 30.
Article in English | MEDLINE | ID: mdl-38036588

ABSTRACT

Dose-response curves are key metrics in pharmacology and biology to assess phenotypic or molecular actions of bioactive compounds in a quantitative fashion. Yet, it is often unclear whether or not a measured response significantly differs from a curve without regulation, particularly in high-throughput applications or unstable assays. Treating potency and effect size estimates from random and true curves with the same level of confidence can lead to incorrect hypotheses and issues in training machine learning models. Here, we present CurveCurator, an open-source software that provides reliable dose-response characteristics by computing p-values and false discovery rates based on a recalibrated F-statistic and a target-decoy procedure that considers dataset-specific effect size distributions. The application of CurveCurator to three large-scale datasets enables a systematic drug mode of action analysis and demonstrates its scalable utility across several application areas, facilitated by a performant, interactive dashboard for fast data exploration.

8.
Science ; 380(6640): 93-101, 2023 04 07.
Article in English | MEDLINE | ID: mdl-36926954

ABSTRACT

Although most cancer drugs modulate the activities of cellular pathways by changing posttranslational modifications (PTMs), little is known regarding the extent and the time- and dose-response characteristics of drug-regulated PTMs. In this work, we introduce a proteomic assay called decryptM that quantifies drug-PTM modulation for thousands of PTMs in cells to shed light on target engagement and drug mechanism of action. Examples range from detecting DNA damage by chemotherapeutics, to identifying drug-specific PTM signatures of kinase inhibitors, to demonstrating that rituximab kills CD20-positive B cells by overactivating B cell receptor signaling. DecryptM profiling of 31 cancer drugs in 13 cell lines demonstrates the broad applicability of the approach. The resulting 1.8 million dose-response curves are provided as an interactive molecular resource in ProteomicsDB.


Subject(s)
Antineoplastic Agents , Apoptosis , Protein Processing, Post-Translational , Proteomics , Antigens, CD20/metabolism , Antineoplastic Agents/pharmacology , Apoptosis/drug effects , B-Lymphocytes/drug effects , Cell Line, Tumor , DNA Damage , Protein Processing, Post-Translational/drug effects , Proteomics/methods , Receptors, Antigen, B-Cell/metabolism , Signal Transduction , Humans
9.
J Proteome Res ; 22(4): 1359-1366, 2023 04 07.
Article in English | MEDLINE | ID: mdl-36988210

ABSTRACT

A frequent goal, or subgoal, when processing data from a quantitative shotgun proteomics experiment is a list of proteins that are differentially abundant under the examined experimental conditions. Unfortunately, obtaining such a list is a challenging process, as the mass spectrometer analyzes the proteolytic peptides of a protein rather than the proteins themselves. We have previously designed a Bayesian hierarchical probabilistic model, Triqler, for combining peptide identification and quantification errors into probabilities of proteins being differentially abundant. However, the model was developed for data from data-dependent acquisition. Here, we show that Triqler is also compatible with data-independent acquisition data after applying minor alterations for the missing value distribution. Furthermore, we find that it has better performance than a set of compared state-of-the-art protein summarization tools when evaluated on data-independent acquisition data.


Subject(s)
Peptides , Proteins , Bayes Theorem , Proteins/analysis , Peptides/analysis , Mass Spectrometry/methods , Proteomics/methods
10.
Methods Mol Biol ; 2426: 91-117, 2023.
Article in English | MEDLINE | ID: mdl-36308686

ABSTRACT

Protein quantification for shotgun proteomics is a complicated process where errors can be introduced in each of the steps. Triqler is a Python package that estimates and integrates errors of the different parts of the label-free protein quantification pipeline into a single Bayesian model. Specifically, it weighs the quantitative values by the confidence we have in the correctness of the corresponding PSM. Furthermore, it treats missing values in a way that reflects their uncertainty relative to observed values. Finally, it combines these error estimates in a single differential abundance FDR that not only reflects the errors and uncertainties in quantification but also in identification. In this tutorial, we show how to (1) generate input data for Triqler from quantification packages such as MaxQuant and Quandenser, (2) run Triqler and what the different options are, (3) interpret the results, (4) investigate the posterior distributions of a protein of interest in detail, and (5) verify that the hyperparameter estimations are sensible.


Subject(s)
Proteins , Proteomics , Bayes Theorem , Uncertainty , Proteomics/methods , Software
11.
Mol Cell Proteomics ; 21(12): 100437, 2022 12.
Article in English | MEDLINE | ID: mdl-36328188

ABSTRACT

Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry-based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam's razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.


Subject(s)
Peptides , Tandem Mass Spectrometry , Humans , Tandem Mass Spectrometry/methods , Databases, Protein , Software , Proteome , Algorithms
12.
Nat Methods ; 19(7): 803-811, 2022 07.
Article in English | MEDLINE | ID: mdl-35710609

ABSTRACT

The laboratory mouse ranks among the most important experimental systems for biomedical research and molecular reference maps of such models are essential informational tools. Here, we present a quantitative draft of the mouse proteome and phosphoproteome constructed from 41 healthy tissues and several lines of analyses exemplify which insights can be gleaned from the data. For instance, tissue- and cell-type resolved profiles provide protein evidence for the expression of 17,000 genes, thousands of isoforms and 50,000 phosphorylation sites in vivo. Proteogenomic comparison of mouse, human and Arabidopsis reveal common and distinct mechanisms of gene expression regulation and, despite many similarities, numerous differentially abundant orthologs that likely serve species-specific functions. We leverage the mouse proteome by integrating phenotypic drug (n > 400) and radiation response data with the proteomes of 66 pancreatic ductal adenocarcinoma (PDAC) cell lines to reveal molecular markers for sensitivity and resistance. This unique atlas complements other molecular resources for the mouse and can be explored online via ProteomicsDB and PACiFIC.


Subject(s)
Arabidopsis , Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Animals , Arabidopsis/genetics , Carcinoma, Pancreatic Ductal/metabolism , Mass Spectrometry , Mice , Pancreatic Neoplasms/genetics , Proteome/analysis
13.
Anal Chem ; 94(20): 7181-7190, 2022 05 24.
Article in English | MEDLINE | ID: mdl-35549156

ABSTRACT

The prediction of fragment ion intensities and retention time of peptides has gained significant attention over the past few years. However, the progress shown in the accurate prediction of such properties focused primarily on unlabeled peptides. Tandem mass tags (TMT) are chemical peptide labels that are coupled to free amine groups usually after protein digestion to enable the multiplexed analysis of multiple samples in bottom-up mass spectrometry. It is a standard workflow in proteomics ranging from single-cell to high-throughput proteomics. Particularly for TMT, increasing the number of confidently identified spectra is highly desirable as it provides identification and quantification information with every spectrum. Here, we report on the generation of an extensive resource of synthetic TMT-labeled peptides as part of the ProteomeTools project and present the extension of the deep learning model Prosit to accurately predict the retention time and fragment ion intensities of TMT-labeled peptides with high accuracy. Prosit-TMT supports CID and HCD fragmentation and ion trap and Orbitrap mass analyzers in a single model. Reanalysis of published TMT data sets show that this single model extracts substantial additional information. Applying Prosit-TMT, we discovered that the expression of many proteins in human breast milk follows a distinct daily cycle which may prime the newborn for nutritional or environmental cues.


Subject(s)
Deep Learning , Tandem Mass Spectrometry , Humans , Infant, Newborn , Peptides/chemistry , Proteolysis , Proteomics/methods , Tandem Mass Spectrometry/methods
14.
Mol Cell Proteomics ; 21(8): 100238, 2022 08.
Article in English | MEDLINE | ID: mdl-35462064

ABSTRACT

Isobaric stable isotope labeling techniques such as tandem mass tags (TMTs) have become popular in proteomics because they enable the relative quantification of proteins with high precision from up to 18 samples in a single experiment. While missing values in peptide quantification are rare in a single TMT experiment, they rapidly increase when combining multiple TMT experiments. As the field moves toward analyzing ever higher numbers of samples, tools that reduce missing values also become more important for analyzing TMT datasets. To this end, we developed SIMSI-Transfer (Similarity-based Isobaric Mass Spectra 2 [MS2] Identification Transfer), a software tool that extends our previously developed software MaRaCluster (© Matthew The) by clustering similar tandem MS2 from multiple TMT experiments. SIMSI-Transfer is based on the assumption that similarity-clustered MS2 spectra represent the same peptide. Therefore, peptide identifications made by database searching in one TMT batch can be transferred to another TMT batch in which the same peptide was fragmented but not identified. To assess the validity of this approach, we tested SIMSI-Transfer on masked search engine identification results and recovered >80% of the masked identifications while controlling errors in the transfer procedure to below 1% false discovery rate. Applying SIMSI-Transfer to six published full proteome and phosphoproteome datasets from the Clinical Proteomic Tumor Analysis Consortium led to an increase of 26 to 45% of identified MS2 spectra with TMT quantifications. This significantly decreased the number of missing values across batches and, in turn, increased the number of peptides and proteins identified in all TMT batches by 43 to 56% and 13 to 16%, respectively.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Cluster Analysis , Isotope Labeling , Peptides , Proteome , Software
15.
Nucleic Acids Res ; 50(D1): D1541-D1552, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34791421

ABSTRACT

ProteomicsDB (https://www.ProteomicsDB.org) is a multi-omics and multi-organism resource for life science research. In this update, we present our efforts to continuously develop and expand ProteomicsDB. The major focus over the last two years was improving the findability, accessibility, interoperability and reusability (FAIR) of the data as well as its implementation. For this purpose, we release a new application programming interface (API) that provides systematic access to essentially all data in ProteomicsDB. Second, we release a new open-source user interface (UI) and show the advantages the scientific community gains from such software. With the new interface, two new visualizations of protein primary, secondary and tertiary structure as well an updated spectrum viewer were added. Furthermore, we integrated ProteomicsDB with our deep-neural-network Prosit that can predict the fragmentation characteristics and retention time of peptides. The result is an automatic processing pipeline that can be used to reevaluate database search engine results stored in ProteomicsDB. In addition, we extended the data content with experiments investigating different human biology as well as a newly supported organism.


Subject(s)
Databases, Protein , Proteins/classification , Proteomics/classification , Software , Biological Science Disciplines , Humans , Neural Networks, Computer , Proteins/chemistry
16.
J Proteome Res ; 20(12): 5402-5411, 2021 12 03.
Article in English | MEDLINE | ID: mdl-34735149

ABSTRACT

Proteomic biomarker discovery using formalin-fixed paraffin-embedded (FFPE) tissue requires robust workflows to support the analysis of large cohorts of patient samples. It also requires finding a reasonable balance between achieving a high proteomic depth and limiting the overall analysis time. To this end, we evaluated the merits of online coupling of single-use disposable trap column nanoflow liquid chromatography, high-field asymmetric-waveform ion-mobility spectrometry (FAIMS), and tandem mass spectrometry (nLC-FAIMS-MS/MS). The data show that ≤600 ng of peptide digest should be loaded onto the chromatographic part of the system. Careful characterization of the FAIMS settings enabled the choice of optimal combinations of compensation voltages (CVs) as a function of the employed LC gradient time. We found nLC-FAIMS-MS/MS to be on par with StageTip-based off-line basic pH reversed-phase fractionation in terms of proteomic depth and reproducibility of protein quantification (coefficient of variation ≤15% for 90% of all proteins) but requiring 50% less sample and substantially reducing sample handling. Using FFPE materials from the lymph node, lung, and prostate tissue as examples, we show that nLC-FAIMS-MS/MS can identify 5000-6000 proteins from the respective tissue within a total of 3 h of analysis time.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Apoptosis Regulatory Proteins , Chromatography, Liquid/methods , Humans , Ion Mobility Spectrometry/methods , Male , Proteomics/methods , Reproducibility of Results , Tandem Mass Spectrometry/methods
17.
Anal Chem ; 93(25): 8687-8692, 2021 06 29.
Article in English | MEDLINE | ID: mdl-34124897

ABSTRACT

A current trend in proteomics is to acquire data in a "single-shot" by LC-MS/MS because it simplifies workflows and promises better throughput and quantitative accuracy than schemes that involve extensive sample fractionation. However, single-shot approaches can suffer from limited proteome coverage when performed by data dependent acquisition (ssDDA) on nanoflow LC systems. For applications where sample quantities are not scarce, this study shows that high proteome coverage can be obtained using a microflow LC-MS/MS system operating a 1 mm i.d. × 150 mm column, at a flow-rate of 50 µL/min and coupled to an Orbitrap HF-X mass spectrometer. The results demonstrate the identification of ∼9 000 proteins from 50 µg of protein digest from Arabidopsis roots, 7 500 from mouse thymus, and 7 300 from human breast cancer cells in 3 h of analysis time in a single run. The dynamic range of protein quantification measured by the iBAQ approach spanned 5 orders of magnitude and replicate analysis showed that the median coefficient of variation was below 20%. Together, this study shows that ssDDA by µLC-MS/MS is a robust method for comprehensive and large-scale proteome analysis and which may be further extended to more rapid chromatography and data independent acquisition approaches in the future.̀.


Subject(s)
Chromatography, Liquid , Proteomics , Tandem Mass Spectrometry , Animals , Arabidopsis , Cell Line , Humans , Mice , Proteome
18.
J Proteome Res ; 20(4): 2062-2068, 2021 04 02.
Article in English | MEDLINE | ID: mdl-33661646

ABSTRACT

Error estimation for differential protein quantification by label-free shotgun proteomics is challenging due to the multitude of error sources, each contributing uncertainty to the final results. We have previously designed a Bayesian model, Triqler, to combine such error terms into one combined quantification error. Here we present an interface for Triqler that takes MaxQuant results as input, allowing quick reanalysis of already processed data. We demonstrate that Triqler outperforms the original processing for a large set of both engineered and clinical/biological relevant data sets. Triqler and its interface to MaxQuant are available as a Python module under an Apache 2.0 license from https://pypi.org/project/triqler/.


Subject(s)
Proteomics , Software , Bayes Theorem , Proteins
19.
Sci Adv ; 7(8)2021 02.
Article in English | MEDLINE | ID: mdl-33608280

ABSTRACT

Induction of the one-carbon cycle is an early hallmark of mitochondrial dysfunction and cancer metabolism. Vital intermediary steps are localized to mitochondria, but it remains unclear how one-carbon availability connects to mitochondrial function. Here, we show that the one-carbon metabolite and methyl group donor S-adenosylmethionine (SAM) is pivotal for energy metabolism. A gradual decline in mitochondrial SAM (mitoSAM) causes hierarchical defects in fly and mouse, comprising loss of mitoSAM-dependent metabolites and impaired assembly of the oxidative phosphorylation system. Complex I stability and iron-sulfur cluster biosynthesis are directly controlled by mitoSAM levels, while other protein targets are predominantly methylated outside of the organelle before import. The mitoSAM pool follows its cytosolic production, establishing mitochondria as responsive receivers of one-carbon units. Thus, we demonstrate that cellular methylation potential is required for energy metabolism, with direct relevance for pathophysiology, aging, and cancer.

20.
Nat Commun ; 11(1): 3234, 2020 06 26.
Article in English | MEDLINE | ID: mdl-32591519

ABSTRACT

In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.


Subject(s)
Proteomics , Cluster Analysis , Databases, Protein , Escherichia coli Proteins/metabolism , HeLa Cells , Humans , Peptides/metabolism , Proteasome Endopeptidase Complex/metabolism , Proteome/metabolism , Saccharomyces cerevisiae/metabolism , Software , Ubiquitin/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...