Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
Add more filters










Publication year range
1.
Proteomics ; 24(1-2): e2300090, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37496303

ABSTRACT

The coefficient of variation (CV) is often used in proteomics as a proxy to characterize the performance of a quantitation method and/or the related software. In this note, we question the excessive reliance on this metric in quantitative proteomics that may result in erroneous conclusions. We support this note using a ground-truth Human-Yeast-E. coli dataset demonstrating in a number of cases that erroneous data processing methods may lead to a low CV which has nothing to do with these methods' performances in quantitation.


Subject(s)
Escherichia coli , Proteomics , Humans , Mass Spectrometry/methods , Proteomics/methods , Software , Saccharomyces cerevisiae
2.
J Proteome Res ; 22(9): 2827-2835, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37579078

ABSTRACT

One of the key steps in data dependent acquisition (DDA) proteomics is detection of peptide isotopic clusters, also called "features", in MS1 spectra and matching them to MS/MS-based peptide identifications. A number of peptide feature detection tools became available in recent years, each relying on its own matching algorithm. Here, we provide an integrated solution, the intensity-based Quantitative Mix and Match Approach (IQMMA), which integrates a number of untargeted peptide feature detection algorithms and returns the most probable intensity values for the MS/MS-based identifications. IQMMA was tested using available proteomic data acquired for both well-characterized (ground truth) and real-world biological samples, including a mix of Yeast and E. coli digests spiked at different concentrations into the Human K562 digest used as a background, and a set of glioblastoma cell lines. Three open-source feature detection algorithms were integrated: Dinosaur, biosaur2, and OpenMS FeatureFinder. None of them was found optimal when applied individually to all the data sets employed in this work; however, their combined use in IQMMA improved efficiency of subsequent protein quantitation. The software implementing IQMMA is freely available at https://github.com/PostoenkoVI/IQMMA under Apache 2.0 license.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Humans , Escherichia coli , Algorithms , Peptides/chemistry , Software
3.
J Proteome Res ; 22(6): 1695-1711, 2023 06 02.
Article in English | MEDLINE | ID: mdl-37158322

ABSTRACT

The proteogenomic search pipeline developed in this work has been applied for reanalysis of 40 publicly available shotgun proteomic datasets from various human tissues comprising more than 8000 individual LC-MS/MS runs, of which 5442 .raw data files were processed in total. This reanalysis was focused on searching for ADAR-mediated RNA editing events, their clustering across samples of different origins, and classification. In total, 33 recoded protein sites were identified in 21 datasets. Of those, 18 sites were detected in at least two datasets, representing the core human protein editome. In agreement with prior artworks, neural and cancer tissues were found to be enriched with recoded proteins. Quantitative analysis indicated that recoding the rate of specific sites did not directly depend on the levels of ADAR enzymes or targeted proteins themselves, rather it was governed by differential and yet undescribed regulation of interaction of enzymes with mRNA. Nine recoding sites conservative between humans and rodents were validated by targeted proteomics using stable isotope standards in the murine brain cortex and cerebellum, and an additional one was validated in human cerebrospinal fluid. In addition to previous data of the same type from cancer proteomes, we provide a comprehensive catalog of recoding events caused by ADAR RNA editing in the human proteome.


Subject(s)
Proteogenomics , Proteomics , Humans , Animals , Mice , RNA/metabolism , RNA Editing , Chromatography, Liquid , Tandem Mass Spectrometry , Proteome/genetics , Proteome/metabolism , Adenosine/metabolism , Inosine/genetics , Inosine/metabolism
4.
Int J Mol Sci ; 24(3)2023 Jan 27.
Article in English | MEDLINE | ID: mdl-36768787

ABSTRACT

Alternative splicing is one of the main regulation pathways in living cells beyond simple changes in the level of protein expression. Most of the approaches proposed in proteomics for the identification of specific splicing isoforms require a preliminary deep transcriptomic analysis of the sample under study, which is not always available, especially in the case of the re-analysis of previously acquired data. Herein, we developed new algorithms for the identification and validation of protein splice isoforms in proteomic data in the absence of RNA sequencing of the samples under study. The bioinformatic approaches were tested on the results of proteome analysis of human melanoma cell lines, obtained earlier by high-resolution liquid chromatography and mass spectrometry (LC-MS). A search for alternative splicing events for each of the cell lines studied was performed against the database generated from all known transcripts (RefSeq) and the one composed of peptide sequences, which included all biologically possible combinations of exons. The identifications were filtered using the prediction of both retention times and relative intensities of fragment ions in the corresponding mass spectra. The fragmentation mass spectra corresponding to the discovered alternative splicing events were additionally examined for artifacts. Selected splicing events were further validated at the mRNA level by quantitative PCR.


Subject(s)
Alternative Splicing , Melanoma , Humans , Alternative Splicing/genetics , Proteome/genetics , Proteome/metabolism , Proteomics/methods , RNA/metabolism , Protein Isoforms/genetics , Protein Isoforms/metabolism , Sequence Analysis, RNA , RNA Splicing , Cell Line , Melanoma/genetics
5.
Proteomics ; 23(5): e2200275, 2023 03.
Article in English | MEDLINE | ID: mdl-36478387

ABSTRACT

Omics technologies focus on uncovering the complex nature of molecular mechanisms in cells and organisms, including biomarkers and drug targets discovery. Aiming at these tasks, we see that information extracted from omics data is still underused. In particular, characteristics of differentially regulated molecules can be combined in a single score to quantify the signaling pathway activity. Such a metric can be useful for comprehensive data interpretation to follow: (1) developing molecular responses in time; (2) potency of a drug on a certain cell culture; (3) ranking the signaling pathway activity in stimulated cells; and (4) integration of the omics data and assay-based measurements of cell viability, cytotoxicity, and proliferation. With recent advances in ultrafast mass spectrometry for quantitative proteomics allowing data collection in a few minutes, proteomics score for cellular response to stimuli can become a fast, accurate, and informative complement to bioassays. Here, we utilized an interquartile-based selection of differentially regulated features and a variety of schemes for quantifying cellular responses to come up with the quantitative metric for total cellular response and pathway activity. Validation was performed using antiproliferative and virus assays and label-free proteomics data collected for cancer cells subjected to drug stimulation.


Subject(s)
Proteomics , Signal Transduction , Proteomics/methods , Biomarkers
6.
Biochemistry (Mosc) ; 87(11): 1301-1309, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36509721

ABSTRACT

RNA editing by adenosine deaminases of the ADAR family can lead to protein recoding, since inosine formed from adenosine in mRNA is complementary to cytosine; the resulting codon editing might introduce amino acid substitutions into translated proteins. Proteome recoding can have functional consequences which have been described in many animals including humans. Using protein recoding database derived from publicly available transcriptome data, we identified for the first time the recoding sites in the zebrafish shotgun proteomes. Out of more than a hundred predicted recoding events, ten substitutions were found in six used datasets. Seven of them were in the AMPA glutamate receptor subunits, whose recoding has been well described, and are conserved among vertebrates. Three sites were specific for zebrafish proteins and were found in the transmembrane receptors astrotactin 1 and neuregulin 3b (proteins involved in the neuronal adhesion and signaling) and in the rims2b gene product (presynaptic membrane protein participating in the neurotransmitter release), respectively. Further studies are needed to elucidate the role of recoding of the said three proteins in the zebrafish.


Subject(s)
Proteomics , Zebrafish , Animals , Humans , Zebrafish/genetics , Zebrafish/metabolism , Proteomics/methods , Zebrafish Proteins/genetics , Adenosine Deaminase/genetics , Adenosine Deaminase/metabolism , Proteome/metabolism , Adenosine/metabolism , RNA, Messenger/genetics
7.
Anal Chem ; 94(38): 13068-13075, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36094425

ABSTRACT

Recently, we presented the DirectMS1 method of ultrafast proteome-wide analysis based on minute-long LC gradients and MS1-only mass spectra acquisition. Currently, the method provides the depth of human cell proteome coverage of 2500 proteins at a 1% false discovery rate (FDR) when using 5 min LC gradients and 7.3 min runtime in total. While the standard MS/MS approaches provide 4000-5000 protein identifications within a couple of hours of instrumentation time, we advocate here that the higher number of identified proteins does not always translate into better quantitation quality of the proteome analysis. To further elaborate on this issue, we performed a one-on-one comparison of quantitation results obtained using DirectMS1 with three popular MS/MS-based quantitation methods: label-free (LFQ) and tandem mass tag quantitation (TMT), both based on data-dependent acquisition (DDA) and data-independent acquisition (DIA). For comparison, we performed a series of proteome-wide analyses of well-characterized (ground truth) and biologically relevant samples, including a mix of UPS1 proteins spiked at different concentrations into an Echerichia coli digest used as a background and a set of glioblastoma cell lines. MS1-only data was analyzed using a novel quantitation workflow called DirectMS1Quant developed in this work. The results obtained in this study demonstrated comparable quantitation efficiency of 5 min DirectMS1 with both TMT and DIA methods, yet the latter two utilized a 10-20-fold longer instrumentation time.


Subject(s)
Proteome , Proteomics , Chromatography, Liquid/methods , Humans , Proteome/analysis , Proteomics/methods , Tandem Mass Spectrometry/methods , Workflow
8.
J Proteome Res ; 21(6): 1566-1574, 2022 06 03.
Article in English | MEDLINE | ID: mdl-35549218

ABSTRACT

Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Algorithms , Cluster Analysis , Consensus , Databases, Protein , Proteomics/methods , Software , Tandem Mass Spectrometry/methods
9.
J Proteome Res ; 21(6): 1438-1448, 2022 06 03.
Article in English | MEDLINE | ID: mdl-35536917

ABSTRACT

Mass spectrometry-based proteome analysis implies matching the mass spectra of proteolytic peptides to amino acid sequences predicted from genomic sequences. Reliability of peptide variant identification in proteogenomic studies is often lacking. We propose a way to interpret shotgun proteomics results, specifically in the data-dependent acquisition mode, as protein sequence coverage by multiple reads as it is done in nucleic acid sequencing for calling of single nucleotide variants. Multiple reads for each sequence position could be provided by overlapping distinct peptides, thus confirming the presence of certain amino acid residues in the overlapping stretch with a lower false discovery rate. Overlapping distinct peptides originate from miscleaved tryptic peptides in combination with their properly cleaved counterparts and from peptides generated by multiple proteases after the same specimen is subject to parallel digestion and analyzed separately. We illustrate this approach using publicly available multiprotease data sets and our own data generated for the HEK-293 cell line digests obtained using trypsin, LysC, and GluC proteases. Totally, up to 30% of the whole proteome was covered by tryptic peptides with up to 7% covered twofold and more. The proteogenomic analysis of the HEK-293 cell line revealed 36 single amino acid variants, seven of which were supported by multiple reads.


Subject(s)
Proteogenomics , Amino Acids , HEK293 Cells , Humans , Peptide Hydrolases , Peptides/analysis , Proteogenomics/methods , Proteome/analysis , Reproducibility of Results
10.
Nat Commun ; 12(1): 5854, 2021 10 06.
Article in English | MEDLINE | ID: mdl-34615866

ABSTRACT

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


Subject(s)
Data Analysis , Databases, Protein , Metadata , Proteomics , Big Data , Humans , Reproducibility of Results , Software , Transcriptome
11.
J Proteomics ; 248: 104350, 2021 09 30.
Article in English | MEDLINE | ID: mdl-34389500

ABSTRACT

Characterization of post-translational modifications is among the most challenging tasks in tandem mass spectrometry-based proteomics which has yet to find an efficient solution. The ultra-tolerant (open) database search attempts to meet this challenge. However, interpretation of the mass shifts observed in open search still requires an effective and automated solution. We have previously introduced the AA_stat tool for analysis of amino acid frequencies at different mass shifts and generation of hypotheses on unaccounted in vitro modifications. Here, we report on the new version of AA_stat, which now complements amino acid frequency statistics with a number of new features: (1) MS/MS-based localization of mass shifts and localization scoring, including shifts which are the sum of modifications; (2) inferring fixed modifications to increase method sensitivity; (3) inferring monoisotopic peak assignment errors and variable modifications based on abundant mass shift localizations to increase the yield of closed search; (4) new mass calibration algorithm to account for partial systematic shifts; (5) interactive integration of all results and a rated list of possible mass shift interpretations. With these options, we improve interpretation of open search results and demonstrate the utility of AA_stat for profiling of abundant and rare amino acid modifications. AA_stat is implemented in Python as an open-source tool available at https://github.com/SimpleNumber/aa_stat. SIGNIFICANCE: Mass spectrometry-based PTM characterization has a long history, yet most of the methods rely on a priori knowledge of modifications of interest and do not provide a whole proteome modification landscape in a blind manner. The open database search is an efficient attempt to address this challenge by identifying peptides with mass shifts corresponding to possible modifications. Then, interpreting these mass shifts is required. Therefore, development of bioinformatics software for post-processing of the open search results, which is capable of detection and accurate annotation of new or unexpected modifications, from characterization of sample preparation efficiency and quality control to discovery of rare post-translational modifications, is of high importance.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Algorithms , Databases, Protein , Protein Processing, Post-Translational , Software
12.
J Proteomics ; 231: 104022, 2021 01 16.
Article in English | MEDLINE | ID: mdl-33096305

ABSTRACT

In order to optimize sample preparation for shotgun proteomics, we compared four cysteine alkylating agents: iodoacetamide, chloroacetamide, 4-vinylpyridine and methyl methanethiosulfonate, and estimated their effects on the results of proteome analysis. Because alkylation may result in methionine modification in vitro, proteomics data were searched for methionine to isothreonine conversions, which may mimic genomic methionine to threonine substitutions found in proteogenomic analyses. We found that chloroacetamide was superior to the other reagents in terms of the number of identified peptides and undesirable off-site reactions. Among the reagents evaluated, iodoacetamide increased the rate of methionine-to-isothreonine conversion, especially if the sample was prepared in gel. The presence of proline following methionine in a protein sequence increased the modification rate as well. Generally, the methionine-to-isothreonine conversion events were relatively rare, but should be taken into account in proteogenomic studies when searching for single nucleotide polymorphism events at the protein level. Additionally, we have evaluated other methionine modifications, such as oxidation and carbamidomethylation. We found that carbamidomethylation may affect up to 80% of peptides containing methionine under the condition of iodoacetamide alkylation. In this case, carbamidomethylation of methionine is more common than oxidation and should be accounted for as a variable modification during proteomic search. SIGNIFICANCE: One of the most trending questions in bottom-up proteomics is the depth of proteome profiling, in other words, the coverage of proteins by identified tryptic peptides. In proteogenomics, where the identification of a single peptide, e.g. bearing an amino acid substitution, may be of interest, high sequence coverage is especially important. Chemical modifications during sample preparation may mimic biologically significant coding mutations at the proteome level. A typical example of such modification is methionine to isothreonine conversion during alkylation, which mimics methionine to threonine substitution in protein sequences due to respective genomic mutations. Therefore, the studies on the proper selection of alkylating reagents which balance the cysteine alkylation efficiency and the extent of methionine conversion upon conventional proteomic sample preparation workflow are crucial for the outcome of proteogenomic analyses and should present a general interest for the proteomic community.


Subject(s)
Cysteine , Proteomics , Alkylation , Iodoacetamide , Methionine
13.
J Proteome Res ; 19(10): 4046-4060, 2020 10 02.
Article in English | MEDLINE | ID: mdl-32866021

ABSTRACT

Adenosine-to-inosine RNA editing is an enzymatic post-transcriptional modification which modulates immunity and neural transmission in multicellular organisms. In particular, it involves editing of mRNA codons with the resulting amino acid substitutions. We identified such sites for developmental proteomes of Drosophila melanogaster at the protein level using available data for 15 stages of fruit fly development from egg to imago and 14 time points of embryogenesis. In total, 40 sites were obtained, each belonging to a unique protein, including four sites related to embryogenesis. The interactome analysis has revealed that the majority of the editing-recoded proteins were associated with synaptic vesicle trafficking and actomyosin organization. Quantitation data analysis suggested the existence of a phase-specific RNA editing regulation with yet unknown mechanisms. These findings supported the transcriptome analysis results, which showed that a burst in the RNA editing occurs during insect metamorphosis from pupa to imago. Finally, targeted proteomic analysis was performed to quantify editing-recoded and genomically encoded versions of five proteins in brains of larvae, pupae, and imago insects, which showed a clear tendency toward an increase in the editing rate for each of them. These results will allow a better understanding of the protein role in physiological effects of RNA editing.


Subject(s)
Drosophila Proteins , RNA Editing , Adenosine Deaminase/genetics , Adenosine Deaminase/metabolism , Animals , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Inosine/metabolism , Proteome/genetics , Proteome/metabolism , Proteomics , RNA, Messenger/genetics
14.
Anal Chem ; 92(6): 4326-4333, 2020 03 17.
Article in English | MEDLINE | ID: mdl-32077687

ABSTRACT

Proteome characterization relies heavily on tandem mass spectrometry (MS/MS) and is thus associated with instrumentation complexity, lengthy analysis time, and limited duty cycle. It was always tempting to implement approaches that do not require MS/MS, yet they were constantly failing to achieve a meaningful depth of quantitative proteome coverage within short experimental times, which is particularly important for clinical or biomarker-discovery applications. Here, we report on the first successful attempt to develop a truly MS/MS-free method, DirectMS1, for bottom-up proteomics. The method is compared with the standard MS/MS-based data-dependent acquisition approach for proteome-wide analysis using 5 min LC gradients. Specifically, we demonstrate identification of 1 000 protein groups for a standard HeLa cell line digest. The amount of loaded sample was varied in a range from 1 to 500 ng, and the method demonstrated 10-fold higher sensitivity. Combined with the recently introduced Diffacto approach for relative protein quantification, DirectMS1 outperforms most popular MS/MS-based label-free quantitation approaches because of significantly higher protein sequence coverage.


Subject(s)
Neoplasm Proteins/analysis , Proteome/analysis , Proteomics , Saccharomyces cerevisiae Proteins/analysis , HeLa Cells , Humans , Tandem Mass Spectrometry , Time Factors
15.
Proteomics ; 19(23): e1900195, 2019 12.
Article in English | MEDLINE | ID: mdl-31576663

ABSTRACT

Proteogenomics is based on the use of customized genome or RNA sequencing databases for interrogation of shotgun proteomics data in search for proteome-level evidence of genome variations or RNA editing. In this work, the products of adenosine-to-inosine RNA editing in human and murine brain proteomes are identified using publicly available brain proteome LC-MS/MS datasets and an RNA editome database compiled from several sources. After filtering of false-positive results, 20 and 37 sites of editing in proteins belonging to 14 and 32 genes are identified for murine and human brain proteomes, respectively. Eight sites of editing identified with high spectral counts overlapped between human and mouse brain samples. Some of these sites have been previously reported using orthogonal methods, such as α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) glutamate receptors, CYFIP2, coatomer alpha. Also, differential editing between neurons and microglia is demonstrated in this work for some of the proteins from primary murine brain cell cultures. Because many edited sites are still not characterized functionally at the protein level, the results provide a necessary background for their further analysis in normal and diseased cells and tissues using targeted proteomic approaches.


Subject(s)
Adenosine/metabolism , Brain/metabolism , Inosine/metabolism , RNA Editing/genetics , Adaptor Proteins, Signal Transducing/metabolism , Animals , Cells, Cultured , Coatomer Protein/metabolism , Humans , Mice , Proteome/metabolism , Proteomics/methods
16.
Proteomics ; 19(3): e1800280, 2019 02.
Article in English | MEDLINE | ID: mdl-30537264

ABSTRACT

Shotgun proteomics workflows for database protein identification typically include a combination of search engines and postsearch validation software based mostly on machine learning algorithms. Here, a new postsearch validation tool called Scavager employing CatBoost, an open-source gradient boosting library, which shows improved efficiency compared with the other popular algorithms, such as Percolator, PeptideProphet, and Q-ranker, is presented. The comparison is done using multiple data sets and search engines, including MSGF+, MSFragger, X!Tandem, Comet, and recently introduced IdentiPy. Implemented in Python programming language, Scavager is open-source and freely available at https://bitbucket.org/markmipt/scavager.


Subject(s)
Algorithms , Proteomics/methods , Databases, Protein , HEK293 Cells , HeLa Cells , Humans , Machine Learning , Programming Languages , Search Engine , Software
17.
J Proteome Res ; 18(2): 709-714, 2019 02 01.
Article in English | MEDLINE | ID: mdl-30576148

ABSTRACT

Many of the novel ideas that drive today's proteomic technologies are focused essentially on experimental or data-processing workflows. The latter are implemented and published in a number of ways, from custom scripts and programs, to projects built using general-purpose or specialized workflow engines; a large part of routine data processing is performed manually or with custom scripts that remain unpublished. Facilitating the development of reproducible data-processing workflows becomes essential for increasing the efficiency of proteomic research. To assist in overcoming the bioinformatics challenges in the daily practice of proteomic laboratories, 5 years ago we developed and announced Pyteomics, a freely available open-source library providing Python interfaces to proteomic data. We summarize the new functionality of Pyteomics developed during the time since its introduction.


Subject(s)
Proteomics/methods , Software , User-Computer Interface , Computational Biology , Workflow
18.
Proteomics ; 18(23): e1800117, 2018 12.
Article in English | MEDLINE | ID: mdl-30307114

ABSTRACT

The efficiency of proteome analysis depends strongly on the configuration parameters of the search engine. One of the murkiest and nontrivial among them is the list of amino acid modifications included for the search. Here, an approach called AA_stat is presented for uncovering the unexpected modifications of amino acid residues in the protein sequences, as well as possible artifacts of data acquisition or processing, in the results of proteome analyses. The approach is based on comparing the amino acid frequencies of different mass shifts observed using the open search method introduced recently. In this work, the proposed approach is applied to publicly available proteomic data is applied and its feasibility for discovering unaccounted modifications or possible pitfalls of the identification workflow is demonstrated. The algorithm is implemented in Python as an open-source command-line tool available at https://bitbucket.org/J_Bale/aa_stat/.


Subject(s)
Amino Acids/analysis , Peptides/analysis , Proteomics/methods , Algorithms
19.
Anal Bioanal Chem ; 410(16): 3827-3833, 2018 Jun.
Article in English | MEDLINE | ID: mdl-29663059

ABSTRACT

Recent advances in mass spectrometry and separation technologies created the opportunities for deep proteome characterization using shotgun proteomics approaches. The "real world" sample complexity and high concentration range limit the sensitivity of this characterization. The common strategy for increasing the sensitivity is sample fractionation prior to analysis either at the protein or the peptide level. Typically, fractionation at the peptide level is performed using linear gradient high-performance liquid chromatography followed by uniform fraction collection. However, this way of peptide fractionation results in significantly suboptimal operation of the mass spectrometer due to the non-uniform distribution of peptides between the fractions. In this work, we propose an approach based on peptide retention time prediction allowing optimization of chromatographic conditions and fraction collection procedures. An open-source software implementing the approach called FractionOptimizer was developed and is available at http://hg.theorchromo.ru/FractionOptimizer . The performance of the developed tool was demonstrated for human embryonic kidney (HEK293) cell line lysate. In these experiments, we improved the uniformity of the peptides distribution between fractions. Moreover, in addition to 13,492 peptides, we found 6787 new peptides not identified in the experiments without fractionation and up to 800 new proteins (or 25%). Graphical abstract The analysis workflow employing FractionOptimizer software.


Subject(s)
Chromatography, Reverse-Phase/methods , Peptides/analysis , Proteins/chemistry , Proteomics/methods , Chromatography, High Pressure Liquid/methods , HEK293 Cells , Humans , Proteome/chemistry , Software , Tandem Mass Spectrometry/methods
20.
J Proteome Res ; 17(7): 2249-2255, 2018 07 06.
Article in English | MEDLINE | ID: mdl-29682971

ABSTRACT

We present an open-source, extensible search engine for shotgun proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel "autotune" feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Autotune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the autotune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in postprocessing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely extended to use third-party scoring functions or processing algorithms and allows customization of the search workflow for specialized applications.


Subject(s)
Proteins/analysis , Proteomics/methods , Search Engine/methods , Algorithms , Programming Languages , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...