Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
1.
PLoS One ; 18(5): e0286312, 2023.
Article in English | MEDLINE | ID: mdl-37235568

ABSTRACT

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear function that we show can be used to help with the determination of appropriate scaling factors. Focusing on what might be called "midrange" distances, we formulate a constrained nonlinear programming problem and use it to produce candidate scaling-factor sets that can be sifted on the basis of further considerations of the data, say via expert knowledge. We give results on some iconic data sets, highlighting the strengths and potential weaknesses of the new approach. These results are generally positive across all the data sets used.


Subject(s)
Algorithms , Cluster Analysis
2.
J Am Soc Mass Spectrom ; 34(4): 794-796, 2023 Apr 05.
Article in English | MEDLINE | ID: mdl-36947430

ABSTRACT

Complex protein mixtures typically generate many tandem mass spectra produced by different peptides coisolated in the gas phase. Widely adopted proteomic data analysis environments usually fail to identify most of these spectra, succeeding at best in identifying only one of the multiple cofragmenting peptides. We present PatternLab V (PLV), an updated version of PatternLab that integrates the YADA 3 deconvolution algorithm to handle such cases efficiently. In general, we expect an increase of 10% in spectral identifications when dealing with complex proteomic samples. PLV is freely available at http://patternlabforproteomics.org.


Subject(s)
Peptides , Proteomics , Peptides/analysis , Proteins/analysis , Algorithms , Tandem Mass Spectrometry , Databases, Protein , Software
3.
J Proteomics ; 277: 104853, 2023 04 15.
Article in English | MEDLINE | ID: mdl-36804625

ABSTRACT

MOTIVATION: There are several well-established paradigms for identifying and pinpointing discriminative peptides/proteins using shotgun proteomic data; examples are peptide-spectrum matching, de novo sequencing, open searches, and even hybrid approaches. Such an arsenal of complementary paradigms can provide deep data coverage, albeit some unidentified discriminative peptides remain. RESULTS: We present DiagnoMass, software tool that groups similar spectra into spectral clusters and then shortlists those clusters that are discriminative for biological conditions. DiagnoMass then communicates with proteomic tools to attempt the identification of such clusters. We demonstrate the effectiveness of DiagnoMass by analyzing proteomic data from Escherichia coli, Salmonella, and Shigella, listing many high-quality discriminative spectral clusters that had thus far remained unidentified by widely adopted proteomic tools. DiagnoMass can also classify proteomic profiles. We anticipate the use of DiagnoMass as a vital tool for pinpointing biomarkers. AVAILABILITY: DiagnoMass and related documentation, including a usage protocol, are available at http://www.diagnomass.com.


Subject(s)
Proteomics , Software , Proteomics/methods , Proteins/chemistry , Peptides/chemistry , Escherichia coli , Algorithms , Databases, Protein
4.
Bioinformatics ; 38(22): 5119-5120, 2022 11 15.
Article in English | MEDLINE | ID: mdl-36130273

ABSTRACT

MOTIVATION: Confident deconvolution of proteomic spectra is critical for several applications such as de novo sequencing, cross-linking mass spectrometry and handling chimeric mass spectra. RESULTS: In general, all deconvolution algorithms may eventually report mass peaks that are not compatible with the chemical formula of any peptide. We show how to remove these artifacts by considering their mass defects. We introduce Y.A.D.A. 3.0, a fast deconvolution algorithm that can remove peaks with unacceptable mass defects. Our approach is effective for polypeptides with less than 10 kDa, and its essence can be easily incorporated into any deconvolution algorithm. AVAILABILITY AND IMPLEMENTATION: Y.A.D.A. 3.0 is freely available for academic use at http://patternlabforproteomics.org/yada3. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.


Subject(s)
Algorithms , Proteomics , Peptides , Mass Spectrometry/methods , Software
5.
Nat Protoc ; 17(7): 1553-1578, 2022 07.
Article in English | MEDLINE | ID: mdl-35411045

ABSTRACT

Shotgun proteomics aims to identify and quantify the thousands of proteins in complex mixtures such as cell and tissue lysates and biological fluids. This approach uses liquid chromatography coupled with tandem mass spectrometry and typically generates hundreds of thousands of mass spectra that require specialized computational environments for data analysis. PatternLab for proteomics is a unified computational environment for analyzing shotgun proteomic data. PatternLab V (PLV) is the most comprehensive and crucial update so far, the result of intensive interaction with the proteomics community over several years. All PLV modules have been optimized and its graphical user interface has been completely updated for improved user experience. Major improvements were made to all aspects of the software, ranging from boosting the number of protein identifications to faster extraction of ion chromatograms. PLV provides modules for preparing sequence databases, protein identification, statistical filtering and in-depth result browsing for both labeled and label-free quantitation. The PepExplorer module can even pinpoint de novo sequenced peptides not already present in the database. PLV is of broad applicability and therefore suitable for challenging experimental setups, such as time-course experiments and data handling from unsequenced organisms. PLV interfaces with widely adopted software and community initiatives, e.g., Comet, Skyline, PEAKS and PRIDE. It is freely available at http://www.patternlabforproteomics.org .


Subject(s)
Proteomics , Software , Databases, Protein , Proteins/chemistry , Proteomics/methods , Tandem Mass Spectrometry
6.
J Proteomics ; 245: 104282, 2021 08 15.
Article in English | MEDLINE | ID: mdl-34089898

ABSTRACT

In proteomics, the identification of peptides from mass spectral data can be mathematically described as the partitioning of mass spectra into clusters (i.e., groups of spectra derived from the same peptide). The way partitions are validated is just as important, having evolved side by side with the clustering algorithms themselves and given rise to many partition assessment measures. An assessment measure is said to have a selection bias if, and only if, the probability that a randomly chosen partition scoring a high value depends on the number of clusters in the partition. In the context of clustering mass spectra, this might mislead the validation process to favor clustering algorithms that generate too many (or few) spectral clusters, regardless of the underlying peptide sequence. A selection bias toward the number of peptides is desirable for proteomics as it estimates the number of peptides in a complex protein mixture. Here, we introduce an assessment measure that is purposely biased toward the number of peptide ion species. We also introduce a partition assessment framework for proteomics, called the Partition Assessment Tool, and demonstrate its importance by evaluating the performance of eight clustering algorithms on seven proteomics datasets while discussing the trade-offs involved. SIGNIFICANCE: Clustering algorithms are widely adopted in proteomics for undertaking several tasks such as speeding up search engines, generating consensus mass spectra, and to aid in the classification of proteomic profiles. Choosing which algorithm is most fit for the task at hand is not simple as each algorithm has advantages and disadvantages; furthermore, specifying clustering parameters is also a necessary and fundamental step. For example, deciding on whether to generate "pure clusters" or fewer clusters but accepting noise. With this as motivation, we verify the performance of several widely adopted algorithms on proteomic datasets and introduce a theoretical framework for drawing conclusions on which approach is suitable for the task at hand.


Subject(s)
Proteomics , Software , Algorithms , Cluster Analysis , Databases, Protein , Selection Bias , Tandem Mass Spectrometry
7.
Phys Rev E ; 103(1-1): 012403, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33601496

ABSTRACT

Bacterial quorum sensing is the communication that takes place between bacteria as they secrete certain molecules into the intercellular medium that later get absorbed by the secreting cells themselves and by others. Depending on cell density, this uptake has the potential to alter gene expression and thereby affect global properties of the community. We consider the case of multiple bacterial species coexisting, referring to each one of them as a genotype and adopting the usual denomination of the molecules they collectively secrete as public goods. A crucial problem in this setting is characterizing the coevolution of genotypes as some of them secrete public goods (and pay the associated metabolic costs) while others do not but may nevertheless benefit from the available public goods. We introduce a network model to describe genotype interaction and evolution when genotype fitness depends on the production and uptake of public goods. The model comprises a random graph to summarize the possible evolutionary pathways the genotypes may take as they interact genetically with one another, and a system of coupled differential equations to characterize the behavior of genotype abundance in time. We study some simple variations of the model analytically and more complex variations computationally. Our results point to a simple trade-off affecting the long-term survival of those genotypes that do produce public goods. This trade-off involves, on the producer side, the impact of producing and that of absorbing the public good. On the nonproducer side, it involves the impact of absorbing the public good as well, now compounded by the molecular compatibility between the producer and the nonproducer. Depending on how these factors turn out, producers may or may not survive.


Subject(s)
Bacteria/cytology , Biological Evolution , Quorum Sensing , Bacteria/genetics , Models, Biological
8.
J Proteomics ; 222: 103803, 2020 06 30.
Article in English | MEDLINE | ID: mdl-32387712

ABSTRACT

We present the Mixed-Data Acquisition (MDA) strategy for mass spectrometry data acquisition. MDA combines Data-Dependent Acquisition (DDA) and Data-Independent Acquisition (DIA) in the same run, thus doing away with the requirements for separate DDA spectral libraries. MDA is a natural result from advances in mass spectrometry, such as high scan rates and multiple analyzers, and is tailored toward exploiting these features. We demonstrate MDA's effectiveness on a yeast proteome analysis by overcoming a common bottleneck for XIC-based label-free quantitation; namely, the coelution of precursors when m/z values cannot be distinguished. We anticipate that MDA will become the next mainstream data generation approach for proteomics. MDA can also serve as an orthogonal validation approach for DDA experiments. Specialized software for MDA data analysis is made available on the project's website.


Subject(s)
Proteome , Proteomics , Mass Spectrometry , Software
9.
J. Proteomics ; 222: 103803, 2020.
Article in English | Sec. Est. Saúde SP, SESSP-IBPROD, Sec. Est. Saúde SP | ID: but-ib17672

ABSTRACT

We present the Mixed-Data Acquisition (MDA) strategy for mass spectrometry data acquisition. MDA combines Data-Dependent Acquisition (DDA) and Data-Independent Acquisition (DIA) in the same run, thus doing away with the requirements for separate DDA spectral libraries. MDA is a natural result from advances in mass spectrometry, such as high scan rates and multiple analyzers, and is tailored toward exploiting these features. We demonstrate MDA's effectiveness on a yeast proteome analysis by overcoming a common bottleneck for XIC-based label-free quantitation; namely, the coelution of precursors when m/z values cannot be distinguished. We anticipate that MDA will become the next mainstream data generation approach for proteomics. MDA can also serve as an orthogonal validation approach for DDA experiments. Specialized software for MDA data analysis is made available on the project's website.

10.
J Proteomics ; 202: 103371, 2019 06 30.
Article in English | MEDLINE | ID: mdl-31034900

ABSTRACT

We present a new module integrated into the widely adopted PatternLab for proteomics to enable analysis of isotope-labeled peptides produced using dimethyl or SILAC. The accurate quantitation of proteins lies within the heart of proteomics; dimethylation has shown to be reliable, inexpensive, and applicable to any sample type. We validate our algorithm using an M. tuberculosis dataset obtained from two biological conditions; we used three dimethyl labels, one serving as an internal control for labeling a mixture of samples from both biological conditions. This internal control certified the proper functioning of our software. Availability: http://patternlabforproteomics.org, freely available for academic use.


Subject(s)
Algorithms , Bacterial Proteins/metabolism , Databases, Protein , Isotope Labeling , Mycobacterium tuberculosis/metabolism , Peptides/chemistry , Proteomics/standards , Bacterial Proteins/chemistry , Peptides/metabolism
11.
BMC Cancer ; 19(1): 365, 2019 Apr 18.
Article in English | MEDLINE | ID: mdl-30999875

ABSTRACT

BACKGROUND: Worldwide, breast cancer is the main cause of cancer mortality in women. Most cases originate in mammary ductal cells that produce the nipple aspirate fluid (NAF). In cancer patients, this secretome contains proteins associated with the tumor microenvironment. NAF studies are challenging because of inter-individual variability. We introduced a paired-proteomic shotgun strategy that relies on NAF analysis from both breasts of patients with unilateral breast cancer and extended PatternLab for Proteomics software to take advantage of this setup. METHODS: The software is based on a peptide-centric approach and uses the binomial distribution to attribute a probability for each peptide as being linked to the disease; these probabilities are propagated to a final protein p-value according to the Stouffer's Z-score method. RESULTS: A total of 1227 proteins were identified and quantified, of which 87 were differentially abundant, being mainly involved in glycolysis (Warburg effect) and immune system activation (activated stroma). Additionally, in the estrogen receptor-positive subgroup, proteins related to the regulation of insulin-like growth factor transport and platelet degranulation displayed higher abundance, confirming the presence of a proliferative microenvironment. CONCLUSIONS: We debuted a differential bioinformatics workflow for the proteomic analysis of NAF, validating this secretome as a treasure-trove for studying a paired-organ cancer type.


Subject(s)
Biomarkers, Tumor/metabolism , Breast Neoplasms/metabolism , Breast Neoplasms/pathology , Nipple Aspirate Fluid/metabolism , Proteome/analysis , Proteomics/methods , Tumor Microenvironment , Aged , Aged, 80 and over , Case-Control Studies , Female , Follow-Up Studies , Humans , Middle Aged , Prognosis , Workflow
12.
Bioinformatics ; 35(18): 3489-3490, 2019 09 15.
Article in English | MEDLINE | ID: mdl-30715205

ABSTRACT

MOTIVATION: We present the first tool for unbiased quality control of top-down proteomics datasets. Our tool can select high-quality top-down proteomics spectra, serve as a gateway for building top-down spectral libraries and, ultimately, improve identification rates. RESULTS: We demonstrate that a twofold rate increase for two E. coli top-down proteomics datasets may be achievable. AVAILABILITY AND IMPLEMENTATION: http://patternlabforproteomics.org/tdgc, freely available for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteomics , Escherichia coli , Software , Tandem Mass Spectrometry
13.
J Theor Biol ; 451: 111-116, 2018 08 14.
Article in English | MEDLINE | ID: mdl-29750998

ABSTRACT

Analyzing the information content of DNA, though holding the promise to help quantify how the processes of evolution have led to information gain throughout the ages, has remained an elusive goal. Paradoxically, one of the main reasons for this has been precisely the great diversity of life on the planet: if on the one hand this diversity is a rich source of data for information-content analysis, on the other hand there is so much variation as to make the task unmanageable. During the past decade or so, however, succinct fragments of the COI mitochondrial gene, which is present in all animal phyla and in a few others, have been shown to be useful for species identification through DNA barcoding. A few million such fragments are now publicly available through the BOLD systems initiative, thus providing an unprecedented opportunity for relatively comprehensive information-theoretic analyses of DNA to be attempted. Here we show how a generalized form of total correlation can yield distinctive information-theoretic descriptors of the phyla represented in those fragments. In order to illustrate the potential of this analysis to provide new insight into the evolution of species, we performed principal component analysis on standardized versions of the said descriptors for 23 phyla. Surprisingly, we found that, though based solely on the species represented in the data, the first principal component correlates strongly with the natural logarithm of the number of all known living species for those phyla. The new descriptors thus constitute clear information-theoretic signatures of the processes whereby evolution has given rise to current biodiversity, which suggests their potential usefulness in further related studies.


Subject(s)
Biodiversity , DNA Barcoding, Taxonomic/methods , Animals , Biological Evolution , DNA, Mitochondrial/genetics , Electron Transport Complex IV/genetics , Phylogeny , Principal Component Analysis
14.
Nat Protoc ; 13(3): 431-458, 2018 03.
Article in English | MEDLINE | ID: mdl-29388937

ABSTRACT

Cross-linking coupled with mass spectrometry (XL-MS) has emerged as a powerful strategy for the identification of protein-protein interactions, characterization of interaction regions, and obtainment of structural information on proteins and protein complexes. In XL-MS, proteins or complexes are covalently stabilized with cross-linkers and digested, followed by identification of the cross-linked peptides by tandem mass spectrometry (MS/MS). This provides spatial constraints that enable modeling of protein (complex) structures and regions of interaction. However, most XL-MS approaches are not capable of differentiating intramolecular from intermolecular links in multimeric complexes, and therefore they cannot be used to study homodimer interfaces. We have recently developed an approach that overcomes this limitation by stable isotope-labeling of one of the two monomers, thereby creating a homodimer with one 'light' and one 'heavy' monomer. Here, we describe a step-by-step protocol for stable isotope-labeling, followed by controlled denaturation and refolding in the presence of the wild-type protein. The resulting light-heavy dimers are cross-linked, digested, and analyzed by mass spectrometry. We show how to quantitatively analyze the corresponding data with SIM-XL, an XL-MS software with a module tailored toward the MS/MS data from homodimers. In addition, we provide a video tutorial of the data analysis with this protocol. This protocol can be performed in ∼14 d, and requires basic biochemical and mass spectrometry skills.


Subject(s)
Isotope Labeling/methods , Tandem Mass Spectrometry/methods , Amino Acid Sequence , Cross-Linking Reagents , Peptides , Protein Conformation , Proteins , Software
15.
Sci Data ; 4: 170090, 2017 07 11.
Article in English | MEDLINE | ID: mdl-28696408

ABSTRACT

Venoms are a rich source for the discovery of molecules with biotechnological applications, but their analysis is challenging even for state-of-the-art proteomics. Here we report on a large-scale proteomic assessment of the venom of Loxosceles intermedia, the so-called brown spider. Venom was extracted from 200 spiders and fractioned into two aliquots relative to a 10 kDa cutoff mass. Each of these was further fractioned and digested with trypsin (4 h), trypsin (18 h), pepsin (18 h), and chymotrypsin (18 h), then analyzed by MudPIT on an LTQ-Orbitrap XL ETD mass spectrometer fragmenting precursors by CID, HCD, and ETD. Aliquots of undigested samples were also analyzed. Our experimental design allowed us to apply spectral networks, thus enabling us to obtain meta-contig assemblies, and consequently de novo sequencing of practically complete proteins, culminating in a deep proteome assessment of the venom. Data are available via ProteomeXchange, with identifier PXD005523.


Subject(s)
Proteome , Spider Venoms/chemistry , Spiders , Animals , Mass Spectrometry , Peptide Hydrolases , Proteomics
16.
Bioinformatics ; 33(12): 1883-1885, 2017 Jun 15.
Article in English | MEDLINE | ID: mdl-28186229

ABSTRACT

MOTIVATION: Around 75% of all mass spectra remain unidentified by widely adopted proteomic strategies. We present DiagnoProt, an integrated computational environment that can efficiently cluster millions of spectra and use machine learning to shortlist high-quality unidentified mass spectra that are discriminative of different biological conditions. RESULTS: We exemplify the use of DiagnoProt by shortlisting 4366 high-quality unidentified tandem mass spectra that are discriminative of different types of the Aspergillus fungus. AVAILABILITY AND IMPLEMENTATION: DiagnoProt, a demonstration video and a user tutorial are available at http://patternlabforproteomics.org/diagnoprot . CONTACT: andrerfsilva@gmail.com or paulo@pcarvalho.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Machine Learning , Proteomics/methods , Sequence Analysis, Protein/methods , Software , Tandem Mass Spectrometry/methods , Aspergillus/metabolism , Fungal Proteins/analysis
17.
Nat Protoc ; 11(1): 102-17, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26658470

ABSTRACT

PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for the analysis of shotgun proteomic data. The contained modules allow for formatting of sequence databases, peptide spectrum matching, statistical filtering and data organization, extracting quantitative information from label-free and chemically labeled data, and analyzing statistics for differential proteomics. PatternLab also has modules to perform similarity-driven studies with de novo sequencing data, to evaluate time-course experiments and to highlight the biological significance of data with regard to the Gene Ontology database. The PatternLab for proteomics 4.0 package brings together all of these modules in a self-contained software environment, which allows for complete proteomic data analysis and the display of results in a variety of graphical formats. All updates to PatternLab, including new features, have been previously tested on millions of mass spectra. PatternLab is easy to install, and it is freely available from http://patternlabforproteomics.org.


Subject(s)
Proteomics/methods , Software , Systems Integration , Databases, Protein , Humans , Peptides/chemistry , Peptides/metabolism , Protein Processing, Post-Translational , Tandem Mass Spectrometry , Time Factors
18.
Curr Protoc Bioinformatics ; 51: 13.27.1-13.27.9, 2015 Sep 03.
Article in English | MEDLINE | ID: mdl-26334921

ABSTRACT

PepExplorer aids in the biological interpretation of de novo sequencing results; this is accomplished by assembling a list of homolog proteins obtained by aligning results from widely adopted de novo sequencing tools against a target-decoy sequence database. Our tool relies on pattern recognition to ensure that the results satisfy a user-given false-discovery rate (FDR). For this, it employs a radial basis function neural network that considers the precursor charge states, de novo sequencing scores, the peptide lengths, and alignment scores. PepExplorer is recommended for studies addressing organisms with no genomic sequence available. PepExplorer is integrated into the PatternLab for proteomics environment, which makes available various tools for downstream data analysis, including the resources for quantitative and differential proteomics.


Subject(s)
Algorithms , Peptides/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Software , Amino Acid Sequence , Data Mining/methods , Databases, Protein , Molecular Sequence Data , Peptides/genetics
19.
J Proteomics ; 129: 51-55, 2015 Nov 03.
Article in English | MEDLINE | ID: mdl-25638023

ABSTRACT

Chemical cross-linking has emerged as a powerful approach for the structural characterization of proteins and protein complexes. However, the correct identification of covalently linked (cross-linked or XL) peptides analyzed by tandem mass spectrometry is still an open challenge. Here we present SIM-XL, a software tool that can analyze data generated through commonly used cross-linkers (e.g., BS3/DSS). Our software introduces a new paradigm for search-space reduction, which ultimately accounts for its increase in speed and sensitivity. Moreover, our search engine is the first to capitalize on reporter ions for selecting tandem mass spectra derived from cross-linked peptides. It also makes available a 2D interaction map and a spectrum-annotation tool unmatched by any of its kind. We show SIM-XL to be more sensitive and faster than a competing tool when analyzing a data set obtained from the human HSP90. The software is freely available for academic use at http://patternlabforproteomics.org/sim-xl. A video demonstrating the tool is available at http://patternlabforproteomics.org/sim-xl/video. SIM-XL is the first tool to support XL data in the mzIdentML format; all data are thus available from the ProteomeXchange consortium (identifier PXD001677). This article is part of a Special Issue entitled: Computational Proteomics.


Subject(s)
Algorithms , Cross-Linking Reagents/chemistry , Peptides/chemistry , Protein Interaction Mapping/methods , Sequence Analysis, Protein/methods , Software , Amino Acid Sequence , Binding Sites , Molecular Sequence Data , Pattern Recognition, Automated/methods , Protein Binding , Tandem Mass Spectrometry/methods , User-Computer Interface
20.
J Proteomics ; 129: 42-50, 2015 Nov 03.
Article in English | MEDLINE | ID: mdl-25623781

ABSTRACT

The production of structurally significant product ions during the dissociation of phosphopeptides is a key to the successful determination of phosphorylation sites. These diagnostic ions can be generated using the widely adopted MS/MS approach, MS3 (Data Dependent Neutral Loss - DDNL), or by multistage activation (MSA). The main purpose of this work is to introduce a false-localization rate (FLR) probabilistic model to enable unbiased phosphoproteomics studies. Briefly, our algorithm infers a probabilistic function from the distribution of the identified phosphopeptides' XCorr Delta scores (XD-Scores) in the current experiment. Our module infers p-values by relying on Gaussian mixture models and a logistic function. We demonstrate the usefulness of our probabilistic model by revisiting the "to MSA, or not to MSA" dilemma. For this, we use human leukemia-derived cells (K562) as a study model and enriched for phosphopeptides using the hydroxyapatite (HAP) chromatography. The aliquots were analyzed with and without MSA on an Orbitrap-XL. Our XD-Scoring analysis revealed that the MS/MS approach provides more identifications because of its faster scan rate, but that for the same given scan rate higher-confidence spectra can be achieved with MSA. Our software is integrated into the PatternLab for proteomics freely available for academic community at http://www.patternlabforproteomics.org. Biological significance Assigning statistical confidence to phosphorylation sites is necessary for proper phosphoproteomic assessment. Here we present a rigorous statistical model, based on Gaussian mixture models and a logistic function, which overcomes shortcomings of previous tools. The algorithm described herein is made readily available to the scientific community by integrating it into the widely adopted PatternLab for proteomics. This article is part of a Special Issue entitled: Computational Proteomics.


Subject(s)
Mass Spectrometry/methods , Models, Statistical , Phosphopeptides/chemistry , Position-Specific Scoring Matrices , Protein Interaction Mapping/methods , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Sequence , Binding Sites , Computer Simulation , Molecular Sequence Data , Phosphorylation , Protein Binding , Proteome/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...