Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 71
Filter
Add more filters










Publication year range
1.
Nat Commun ; 15(1): 3956, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38730277

ABSTRACT

Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.


Subject(s)
Deep Learning , Peptides , Tandem Mass Spectrometry , Humans , Peptides/chemistry , Peptides/immunology , Tandem Mass Spectrometry/methods , Databases, Protein , Proteomics/methods , HLA Antigens/immunology , HLA Antigens/genetics , Software , Ions
2.
bioRxiv ; 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38617311

ABSTRACT

Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of pre-defined peptides of interest. Such approaches have not yet been used to confirm existence of novel peptides. Here, we present a targeted proteogenomic approach that leverages sample-matched long-read RNA sequencing (LR RNAseq) data to predict potential protein isoforms with prior transcript evidence. Predicted tryptic isoform-specific peptides, which are specific to individual gene product isoforms, serve as "triggers" and "targets" in the IS-PRM method, Tomahto. Using the model human stem cell line WTC11, LR RNAseq data were generated and used to inform the generation of synthetic standards for 192 isoform-specific peptides (114 isoforms from 55 genes). These synthetic "trigger" peptides were labeled with super heavy tandem mass tags (TMT) and spiked into TMT-labeled WTC11 tryptic digest, predicted to contain corresponding endogenous "target" peptides. Compared to DDA mode, Tomahto increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms. Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms. This LR RNA seq-informed Tomahto targeted approach, called LRP-IS-PRM, is a new modality for generating protein-level evidence of alternative isoforms - a critical first step in designing functional studies and eventually clinical assays.

3.
Methods Mol Biol ; 2758: 457-483, 2024.
Article in English | MEDLINE | ID: mdl-38549030

ABSTRACT

Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.


Subject(s)
Deep Learning , Humans , Chromatography, Liquid , Tandem Mass Spectrometry/methods , Peptides/analysis , Histocompatibility Antigens Class I , HLA Antigens
5.
Nat Commun ; 15(1): 151, 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38167372

ABSTRACT

Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.


Subject(s)
Deep Learning , Algorithms , Sequence Analysis, Protein/methods , Peptides/chemistry , Amino Acid Sequence
6.
Proteomics ; 24(8): e2300112, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37672792

ABSTRACT

Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.


Subject(s)
Proteomics , Software , Proteomics/methods , Peptides , Algorithms
7.
medRxiv ; 2023 Nov 09.
Article in English | MEDLINE | ID: mdl-38076997

ABSTRACT

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.

8.
Nat Chem Biol ; 2023 Oct 30.
Article in English | MEDLINE | ID: mdl-37904048

ABSTRACT

Medicinal chemistry has discovered thousands of potent protein and lipid kinase inhibitors. These may be developed into therapeutic drugs or chemical probes to study kinase biology. Because of polypharmacology, a large part of the human kinome currently lacks selective chemical probes. To discover such probes, we profiled 1,183 compounds from drug discovery projects in lysates of cancer cell lines using Kinobeads. The resulting 500,000 compound-target interactions are available in ProteomicsDB and we exemplify how this molecular resource may be used. For instance, the data revealed several hundred reasonably selective compounds for 72 kinases. Cellular assays validated GSK986310C as a candidate SYK (spleen tyrosine kinase) probe and X-ray crystallography uncovered the structural basis for the observed selectivity of the CK2 inhibitor GW869516X. Compounds targeting PKN3 were discovered and phosphoproteomics identified substrates that indicate target engagement in cells. We anticipate that this molecular resource will aid research in drug discovery and chemical biology.

9.
Anal Chem ; 95(37): 13746-13749, 2023 09 19.
Article in English | MEDLINE | ID: mdl-37676919

ABSTRACT

Mass spectrometry coupled to liquid chromatography is one of the most powerful technologies for proteome quantification in biomedical samples. In peptide-centric workflows, protein mixtures are enzymatically digested to peptides prior their analysis. However, proteome-wide quantification studies rarely identify all potential peptides for any given protein, and targeted proteomics experiments focus on a set of peptides for the proteins of interest. Consequently, proteomics relies on the use of a limited subset of all possible peptides as proxies for protein quantitation. In this work, we evaluated the stability of the human proteotypic peptides during 21 days and trained a deep learning model to predict peptide stability directly from tryptic sequences, which together constitute a resource of broad interest to prioritize and select peptides in proteome quantification experiments.


Subject(s)
Proteome , Proteomics , Humans , Peptides , Chromatography, Liquid , Mass Spectrometry
10.
Nat Commun ; 14(1): 4632, 2023 08 02.
Article in English | MEDLINE | ID: mdl-37532709

ABSTRACT

Systemic pan-tumor analyses may reveal the significance of common features implicated in cancer immunogenicity and patient survival. Here, we provide a comprehensive multi-omics data set for 32 patients across 25 tumor types for proteogenomic-based discovery of neoantigens. By using an optimized computational approach, we discover a large number of tumor-specific and tumor-associated antigens. To create a pipeline for the identification of neoantigens in our cohort, we combine DNA and RNA sequencing with MS-based immunopeptidomics of tumor specimens, followed by the assessment of their immunogenicity and an in-depth validation process. We detect a broad variety of non-canonical HLA-binding peptides in the majority of patients demonstrating partially immunogenicity. Our validation process allows for the selection of 32 potential neoantigen candidates. The majority of neoantigen candidates originates from variants identified in the RNA data set, illustrating the relevance of RNA as a still understudied source of cancer antigens. This study underlines the importance of RNA-centered variant detection for the identification of shared biomarkers and potentially relevant neoantigen candidates.


Subject(s)
Neoplasms , Proteogenomics , Humans , Neoplasms/genetics , Antigens, Neoplasm/genetics , Peptides
11.
J Proteome Res ; 22(9): 2836-2846, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37557900

ABSTRACT

Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.


Subject(s)
Peptides , Proteomics , Proteomics/methods , Peptides/analysis , Proteome/analysis , Gene Library , Workflow , Peptide Library
12.
Science ; 380(6640): 93-101, 2023 04 07.
Article in English | MEDLINE | ID: mdl-36926954

ABSTRACT

Although most cancer drugs modulate the activities of cellular pathways by changing posttranslational modifications (PTMs), little is known regarding the extent and the time- and dose-response characteristics of drug-regulated PTMs. In this work, we introduce a proteomic assay called decryptM that quantifies drug-PTM modulation for thousands of PTMs in cells to shed light on target engagement and drug mechanism of action. Examples range from detecting DNA damage by chemotherapeutics, to identifying drug-specific PTM signatures of kinase inhibitors, to demonstrating that rituximab kills CD20-positive B cells by overactivating B cell receptor signaling. DecryptM profiling of 31 cancer drugs in 13 cell lines demonstrates the broad applicability of the approach. The resulting 1.8 million dose-response curves are provided as an interactive molecular resource in ProteomicsDB.


Subject(s)
Antineoplastic Agents , Apoptosis , Protein Processing, Post-Translational , Proteomics , Antigens, CD20/metabolism , Antineoplastic Agents/pharmacology , Apoptosis/drug effects , B-Lymphocytes/drug effects , Cell Line, Tumor , DNA Damage , Protein Processing, Post-Translational/drug effects , Proteomics/methods , Receptors, Antigen, B-Cell/metabolism , Signal Transduction , Humans
13.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Article in English | MEDLINE | ID: mdl-36744821

ABSTRACT

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Subject(s)
Machine Learning , Proteomics , Proteomics/methods , Algorithms , Mass Spectrometry
14.
Mol Cell Proteomics ; 21(12): 100437, 2022 12.
Article in English | MEDLINE | ID: mdl-36328188

ABSTRACT

Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry-based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam's razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.


Subject(s)
Peptides , Tandem Mass Spectrometry , Humans , Tandem Mass Spectrometry/methods , Databases, Protein , Software , Proteome , Algorithms
16.
Nat Methods ; 19(7): 803-811, 2022 07.
Article in English | MEDLINE | ID: mdl-35710609

ABSTRACT

The laboratory mouse ranks among the most important experimental systems for biomedical research and molecular reference maps of such models are essential informational tools. Here, we present a quantitative draft of the mouse proteome and phosphoproteome constructed from 41 healthy tissues and several lines of analyses exemplify which insights can be gleaned from the data. For instance, tissue- and cell-type resolved profiles provide protein evidence for the expression of 17,000 genes, thousands of isoforms and 50,000 phosphorylation sites in vivo. Proteogenomic comparison of mouse, human and Arabidopsis reveal common and distinct mechanisms of gene expression regulation and, despite many similarities, numerous differentially abundant orthologs that likely serve species-specific functions. We leverage the mouse proteome by integrating phenotypic drug (n > 400) and radiation response data with the proteomes of 66 pancreatic ductal adenocarcinoma (PDAC) cell lines to reveal molecular markers for sensitivity and resistance. This unique atlas complements other molecular resources for the mouse and can be explored online via ProteomicsDB and PACiFIC.


Subject(s)
Arabidopsis , Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Animals , Arabidopsis/genetics , Carcinoma, Pancreatic Ductal/metabolism , Mass Spectrometry , Mice , Pancreatic Neoplasms/genetics , Proteome/analysis
17.
Anal Chem ; 94(20): 7181-7190, 2022 05 24.
Article in English | MEDLINE | ID: mdl-35549156

ABSTRACT

The prediction of fragment ion intensities and retention time of peptides has gained significant attention over the past few years. However, the progress shown in the accurate prediction of such properties focused primarily on unlabeled peptides. Tandem mass tags (TMT) are chemical peptide labels that are coupled to free amine groups usually after protein digestion to enable the multiplexed analysis of multiple samples in bottom-up mass spectrometry. It is a standard workflow in proteomics ranging from single-cell to high-throughput proteomics. Particularly for TMT, increasing the number of confidently identified spectra is highly desirable as it provides identification and quantification information with every spectrum. Here, we report on the generation of an extensive resource of synthetic TMT-labeled peptides as part of the ProteomeTools project and present the extension of the deep learning model Prosit to accurately predict the retention time and fragment ion intensities of TMT-labeled peptides with high accuracy. Prosit-TMT supports CID and HCD fragmentation and ion trap and Orbitrap mass analyzers in a single model. Reanalysis of published TMT data sets show that this single model extracts substantial additional information. Applying Prosit-TMT, we discovered that the expression of many proteins in human breast milk follows a distinct daily cycle which may prime the newborn for nutritional or environmental cues.


Subject(s)
Deep Learning , Tandem Mass Spectrometry , Humans , Infant, Newborn , Peptides/chemistry , Proteolysis , Proteomics/methods , Tandem Mass Spectrometry/methods
18.
Proteomics ; 22(19-20): e2100257, 2022 10.
Article in English | MEDLINE | ID: mdl-35578405

ABSTRACT

Isobaric labeling increases the throughput of proteomics by enabling the parallel identification and quantification of peptides and proteins. Over the past decades, a variety of isobaric tags have been developed allowing the multiplexed analysis of up to 18 samples. However, experiments utilizing such tags often exhibit reduced identification rates and thus show decreased analytical depth. Re-scoring has been shown to rescue otherwise missed identifications but was not yet systematically applied on isobarically labeled data. Because iTRAQ 4/8-plex and the recently released TMTpro 16/18-plex share similar characteristics with TMT 6/10/11-plex, we hypothesized that Prosit-TMT, trained exclusively on 6/10/11-plex labeled peptides, may be applicable to these isobaric labeling strategies as well. To investigate this, we re-analyzed nine publicly available datasets covering iTRAQ and TMTpro labeling for samples with human and mouse origin. We highlight that Prosit-TMT shows remarkably good performance when comparing experimentally acquired and predicted fragmentation spectra (R of 0.84 - 0.9) and retention times (ΔRT95% of 3% - 10% gradient time) of peptides. Furthermore, re-scoring substantially increases the number of confidently identified spectra, peptides, and proteins.


Subject(s)
Peptides , Proteomics , Humans , Mice , Animals , Peptides/analysis , Proteins , Indicators and Reagents
19.
Mol Cell Proteomics ; 21(8): 100238, 2022 08.
Article in English | MEDLINE | ID: mdl-35462064

ABSTRACT

Isobaric stable isotope labeling techniques such as tandem mass tags (TMTs) have become popular in proteomics because they enable the relative quantification of proteins with high precision from up to 18 samples in a single experiment. While missing values in peptide quantification are rare in a single TMT experiment, they rapidly increase when combining multiple TMT experiments. As the field moves toward analyzing ever higher numbers of samples, tools that reduce missing values also become more important for analyzing TMT datasets. To this end, we developed SIMSI-Transfer (Similarity-based Isobaric Mass Spectra 2 [MS2] Identification Transfer), a software tool that extends our previously developed software MaRaCluster (© Matthew The) by clustering similar tandem MS2 from multiple TMT experiments. SIMSI-Transfer is based on the assumption that similarity-clustered MS2 spectra represent the same peptide. Therefore, peptide identifications made by database searching in one TMT batch can be transferred to another TMT batch in which the same peptide was fragmented but not identified. To assess the validity of this approach, we tested SIMSI-Transfer on masked search engine identification results and recovered >80% of the masked identifications while controlling errors in the transfer procedure to below 1% false discovery rate. Applying SIMSI-Transfer to six published full proteome and phosphoproteome datasets from the Clinical Proteomic Tumor Analysis Consortium led to an increase of 26 to 45% of identified MS2 spectra with TMT quantifications. This significantly decreased the number of missing values across batches and, in turn, increased the number of peptides and proteins identified in all TMT batches by 43 to 56% and 13 to 16%, respectively.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Cluster Analysis , Isotope Labeling , Peptides , Proteome , Software
20.
J Proteome Res ; 21(5): 1359-1364, 2022 05 06.
Article in English | MEDLINE | ID: mdl-35413196

ABSTRACT

Machine learning has been an integral part of interpreting data from mass spectrometry (MS)-based proteomics for a long time. Relatively recently, a machine-learning structure appeared successful in other areas of bioinformatics, Transformers. Furthermore, the implementation of Transformers within bioinformatics has become relatively convenient due to transfer learning, i.e., adapting a network trained for other tasks to new functionality. Transfer learning makes these relatively large networks more accessible as it generally requires less data, and the training time improves substantially. We implemented a Transformer based on the pretrained model TAPE to predict MS2 intensities. TAPE is a general model trained to predict missing residues from protein sequences. Despite being trained for a different task, we could modify its behavior by adding a prediction head at the end of the TAPE model and fine-tune it using the spectrum intensity from the training set to the well-known predictor Prosit. We demonstrate that the predictor, which we call Prosit Transformer, outperforms the recurrent neural-network-based predictor Prosit, increasing the median angular similarity on its hold-out set from 0.908 to 0.929. We believe that Transformers will significantly increase prediction accuracy for other types of predictions within MS-based proteomics.


Subject(s)
Machine Learning , Neural Networks, Computer , Amino Acid Sequence , Mass Spectrometry , Proteomics
SELECTION OF CITATIONS
SEARCH DETAIL
...