Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Commun ; 15(1): 3956, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38730277

ABSTRACT

Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.


Subject(s)
Deep Learning , Peptides , Tandem Mass Spectrometry , Humans , Peptides/chemistry , Peptides/immunology , Tandem Mass Spectrometry/methods , Databases, Protein , Proteomics/methods , HLA Antigens/immunology , HLA Antigens/genetics , Software , Ions
2.
Methods Mol Biol ; 2758: 457-483, 2024.
Article in English | MEDLINE | ID: mdl-38549030

ABSTRACT

Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.


Subject(s)
Deep Learning , Humans , Chromatography, Liquid , Tandem Mass Spectrometry/methods , Peptides/analysis , Histocompatibility Antigens Class I , HLA Antigens
3.
Proteomics ; 24(8): e2300112, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37672792

ABSTRACT

Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.


Subject(s)
Proteomics , Software , Proteomics/methods , Peptides , Algorithms
4.
J Proteome Res ; 22(9): 2836-2846, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37557900

ABSTRACT

Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.


Subject(s)
Peptides , Proteomics , Proteomics/methods , Peptides/analysis , Proteome/analysis , Gene Library , Workflow , Peptide Library
5.
Proteomics ; 22(19-20): e2100257, 2022 10.
Article in English | MEDLINE | ID: mdl-35578405

ABSTRACT

Isobaric labeling increases the throughput of proteomics by enabling the parallel identification and quantification of peptides and proteins. Over the past decades, a variety of isobaric tags have been developed allowing the multiplexed analysis of up to 18 samples. However, experiments utilizing such tags often exhibit reduced identification rates and thus show decreased analytical depth. Re-scoring has been shown to rescue otherwise missed identifications but was not yet systematically applied on isobarically labeled data. Because iTRAQ 4/8-plex and the recently released TMTpro 16/18-plex share similar characteristics with TMT 6/10/11-plex, we hypothesized that Prosit-TMT, trained exclusively on 6/10/11-plex labeled peptides, may be applicable to these isobaric labeling strategies as well. To investigate this, we re-analyzed nine publicly available datasets covering iTRAQ and TMTpro labeling for samples with human and mouse origin. We highlight that Prosit-TMT shows remarkably good performance when comparing experimentally acquired and predicted fragmentation spectra (R of 0.84 - 0.9) and retention times (ΔRT95% of 3% - 10% gradient time) of peptides. Furthermore, re-scoring substantially increases the number of confidently identified spectra, peptides, and proteins.


Subject(s)
Peptides , Proteomics , Humans , Mice , Animals , Peptides/analysis , Proteins , Indicators and Reagents
6.
Anal Chem ; 94(20): 7181-7190, 2022 05 24.
Article in English | MEDLINE | ID: mdl-35549156

ABSTRACT

The prediction of fragment ion intensities and retention time of peptides has gained significant attention over the past few years. However, the progress shown in the accurate prediction of such properties focused primarily on unlabeled peptides. Tandem mass tags (TMT) are chemical peptide labels that are coupled to free amine groups usually after protein digestion to enable the multiplexed analysis of multiple samples in bottom-up mass spectrometry. It is a standard workflow in proteomics ranging from single-cell to high-throughput proteomics. Particularly for TMT, increasing the number of confidently identified spectra is highly desirable as it provides identification and quantification information with every spectrum. Here, we report on the generation of an extensive resource of synthetic TMT-labeled peptides as part of the ProteomeTools project and present the extension of the deep learning model Prosit to accurately predict the retention time and fragment ion intensities of TMT-labeled peptides with high accuracy. Prosit-TMT supports CID and HCD fragmentation and ion trap and Orbitrap mass analyzers in a single model. Reanalysis of published TMT data sets show that this single model extracts substantial additional information. Applying Prosit-TMT, we discovered that the expression of many proteins in human breast milk follows a distinct daily cycle which may prime the newborn for nutritional or environmental cues.


Subject(s)
Deep Learning , Tandem Mass Spectrometry , Humans , Infant, Newborn , Peptides/chemistry , Proteolysis , Proteomics/methods , Tandem Mass Spectrometry/methods
7.
J Proteome Res ; 21(5): 1359-1364, 2022 05 06.
Article in English | MEDLINE | ID: mdl-35413196

ABSTRACT

Machine learning has been an integral part of interpreting data from mass spectrometry (MS)-based proteomics for a long time. Relatively recently, a machine-learning structure appeared successful in other areas of bioinformatics, Transformers. Furthermore, the implementation of Transformers within bioinformatics has become relatively convenient due to transfer learning, i.e., adapting a network trained for other tasks to new functionality. Transfer learning makes these relatively large networks more accessible as it generally requires less data, and the training time improves substantially. We implemented a Transformer based on the pretrained model TAPE to predict MS2 intensities. TAPE is a general model trained to predict missing residues from protein sequences. Despite being trained for a different task, we could modify its behavior by adding a prediction head at the end of the TAPE model and fine-tune it using the spectrum intensity from the training set to the well-known predictor Prosit. We demonstrate that the predictor, which we call Prosit Transformer, outperforms the recurrent neural-network-based predictor Prosit, increasing the median angular similarity on its hold-out set from 0.908 to 0.929. We believe that Transformers will significantly increase prediction accuracy for other types of predictions within MS-based proteomics.


Subject(s)
Machine Learning , Neural Networks, Computer , Amino Acid Sequence , Mass Spectrometry , Proteomics
8.
Nat Commun ; 13(1): 165, 2022 01 10.
Article in English | MEDLINE | ID: mdl-35013197

ABSTRACT

Proteome-wide measurements of protein turnover have largely ignored the impact of post-translational modifications (PTMs). To address this gap, we employ stable isotope labeling and mass spectrometry to measure the turnover of >120,000 peptidoforms including >33,000 phosphorylated, acetylated, and ubiquitinated peptides for >9,000 native proteins. This site-resolved protein turnover (SPOT) profiling discloses global and site-specific differences in turnover associated with the presence or absence of PTMs. While causal relationships may not always be immediately apparent, we speculate that PTMs with diverging turnover may distinguish states of differential protein stability, structure, localization, enzymatic activity, or protein-protein interactions. We show examples of how the turnover data may give insights into unknown functions of PTMs and provide a freely accessible online tool that allows interrogation and visualisation of all turnover data. The SPOT methodology is applicable to many cell types and modifications, offering the potential to prioritize PTMs for future functional investigations.


Subject(s)
Protein Processing, Post-Translational , Proteins/metabolism , Proteome/metabolism , Software , Acetylation , B-Lymphocytes/cytology , B-Lymphocytes/metabolism , Cell Line, Tumor , Half-Life , HeLa Cells , Humans , Phosphorylation , Protein Binding , Protein Interaction Mapping , Protein Stability , Proteins/genetics , Proteolysis , Proteome/classification , Proteome/genetics , Proteomics/methods , Ubiquitination
9.
J Proteome Res ; 20(4): 2083-2088, 2021 04 02.
Article in English | MEDLINE | ID: mdl-33661648

ABSTRACT

The study of microbiomes has gained in importance over the past few years and has led to the emergence of the fields of metagenomics, metatranscriptomics, and metaproteomics. While initially focused on the study of biodiversity within these communities, the emphasis has increasingly shifted to the study of (changes in) the complete set of functions available in these communities. A key tool to study this functional complement of a microbiome is Gene Ontology (GO) term analysis. However, comparing large sets of GO terms is not an easy task due to the deeply branched nature of GO, which limits the utility of exact term matching. To solve this problem, we here present MegaGO, a user-friendly tool that relies on semantic similarity between GO terms to compute the functional similarity between multiple data sets. MegaGO is high performing: Each set can contain thousands of GO terms, and results are calculated in a matter of seconds. MegaGO is available as a web application at https://megago.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the MIT license and is available at https://github.com/MEGA-GO/.


Subject(s)
Microbiota , Software , Computational Biology , Gene Ontology , Metagenomics , Semantics
SELECTION OF CITATIONS
SEARCH DETAIL
...