Search | VHL Regional Portal

AIRI: Predicting Retention Indices and Their Uncertainties Using Artificial Intelligence.

Geer, Lewis Y; Stein, Stephen E; Mallard, William Gary; Slotta, Douglas J.

J Chem Inf Model ; 64(3): 690-696, 2024 Feb 12.

Article in English | MEDLINE | ID: mdl-38230885

ABSTRACT

The Kováts retention index (RI) is a quantity measured using gas chromatography and is commonly used in the identification of chemical structures. Creating libraries of observed RI values is a laborious task, so we explore the use of a deep neural network for predicting RI values from structure for standard semipolar columns. This network generated predictions with a mean absolute error of 15.1 and, in a quantification of the tail of the error distribution, a 95th percentile absolute error of 46.5. Because of the Artificial Intelligence Retention Indices (AIRI) network's accuracy, it was used to predict RI values for the NIST EI-MS spectral libraries. These RI values are used to improve chemical identification methods and the quality of the library. Estimating uncertainty is an important practical need when using prediction models. To quantify the uncertainty of our network for each individual prediction, we used the outputs of an ensemble of 8 networks to calculate a predicted standard deviation for each RI value prediction. This predicted standard deviation was corrected to follow the error between the observed and predicted RI values. The Z scores using these predicted standard deviations had a standard deviation of 1.52 and a 95th percentile absolute Z score corresponding to a mean RI value of 42.6.

Subject(s)

Artificial Intelligence , Neural Networks, Computer , Uncertainty

AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence.

Geer, Lewis Y; Lapin, Joel; Slotta, Douglas J; Mak, Tytus D; Stein, Stephen E.

J Proteome Res ; 22(7): 2246-2255, 2023 07 07.

Article in English | MEDLINE | ID: mdl-37232537

ABSTRACT

The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.

Subject(s)

Peptide Library , Proteome , Proteome/metabolism , Artificial Intelligence , Tandem Mass Spectrometry , Algorithms , Phosphopeptides , Databases, Protein , Software

Validating the AMRFinder Tool and Resistance Gene Database by Using Antimicrobial Resistance Genotype-Phenotype Correlations in a Collection of Isolates.

Feldgarden, Michael; Brover, Vyacheslav; Haft, Daniel H; Prasad, Arjun B; Slotta, Douglas J; Tolstoy, Igor; Tyson, Gregory H; Zhao, Shaohua; Hsu, Chih-Hao; McDermott, Patrick F; Tadesse, Daniel A; Morales, Cesar; Simmons, Mustafa; Tillman, Glenn; Wasilenko, Jamie; Folster, Jason P; Klimke, William.

Antimicrob Agents Chemother ; 63(11)2019 11.

Article in English | MEDLINE | ID: mdl-31427293

ABSTRACT

Antimicrobial resistance (AMR) is a major public health problem that requires publicly available tools for rapid analysis. To identify AMR genes in whole-genome sequences, the National Center for Biotechnology Information (NCBI) has produced AMRFinder, a tool that identifies AMR genes using a high-quality curated AMR gene reference database. The Bacterial Antimicrobial Resistance Reference Gene Database consists of up-to-date gene nomenclature, a set of hidden Markov models (HMMs), and a curated protein family hierarchy. Currently, it contains 4,579 antimicrobial resistance proteins and more than 560 HMMs. Here, we describe AMRFinder and its associated database. To assess the predictive ability of AMRFinder, we measured the consistency between predicted AMR genotypes from AMRFinder and resistance phenotypes of 6,242 isolates from the National Antimicrobial Resistance Monitoring System (NARMS). This included 5,425 Salmonella enterica, 770 Campylobacter spp., and 47 Escherichia coli isolates phenotypically tested against various antimicrobial agents. Of 87,679 susceptibility tests performed, 98.4% were consistent with predictions. To assess the accuracy of AMRFinder, we compared its gene symbol output with that of a 2017 version of ResFinder, another publicly available resistance gene detection system. Most gene calls were identical, but there were 1,229 gene symbol differences (8.8%) between them, with differences due to both algorithmic differences and database composition. AMRFinder missed 16 loci that ResFinder found, while ResFinder missed 216 loci that AMRFinder identified. Based on these results, AMRFinder appears to be a highly accurate AMR gene detection system.

From Peptidome to PRIDE: public proteomics data migration at a large scale.

Csordas, Attila; Wang, Rui; Ríos, Daniel; Reisinger, Florian; Foster, Joseph M; Slotta, Douglas J; Vizcaíno, Juan Antonio; Hermjakob, Henning.

Proteomics ; 13(10-11): 1692-5, 2013 May.

Article in English | MEDLINE | ID: mdl-23533138

ABSTRACT

The PRIDE database, developed and maintained at the European Bioinformatics Institute (EBI), is one of the most prominent data repositories dedicated to high throughput MS-based proteomics data. Peptidome, developed by the National Center for Biotechnology Information (NCBI) as a sibling resource to PRIDE, was discontinued due to funding constraints in April 2011. A joint effort between the two teams was started soon after the Peptidome closure to ensure that data were not "lost" to the wider proteomics community by exporting it to PRIDE. As a result, data in the low terabyte range have been migrated from Peptidome to PRIDE and made publicly available under experiment accessions 17 900-18 271, representing 54 projects, ~53 million mass spectra, ~10 million peptide identifications, ~650,000 protein identifications, ~1.1 million biologically relevant protein modifications, and 28 species, from more than 30 different labs.

Subject(s)

Databases, Protein , Proteome/chemistry , Information Storage and Retrieval , Molecular Sequence Annotation , Proteomics , Tandem Mass Spectrometry

MassSieve: panning MS/MS peptide data for proteins.

Slotta, Douglas J; McFarland, Melinda A; Markey, Sanford P.

Proteomics ; 10(16): 3035-9, 2010 Aug.

Article in English | MEDLINE | ID: mdl-20564260

ABSTRACT

We present MassSieve, a Java-based platform for visualization and parsimony analysis of single and comparative LC-MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC-MS/MS-based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments.

Subject(s)

Computational Biology/methods , Data Mining/methods , Peptide Mapping/methods , Software , Tandem Mass Spectrometry/methods , Algorithms , Peptide Fragments/chemistry , Statistics, Nonparametric , User-Computer Interface

NCBI Peptidome: a new repository for mass spectrometry proteomics data.

Ji, Li; Barrett, Tanya; Ayanbule, Oluwabukunmi; Troup, Dennis B; Rudnev, Dmitry; Muertter, Rolf N; Tomashevsky, Maxim; Soboleva, Alexandra; Slotta, Douglas J.

Nucleic Acids Res ; 38(Database issue): D731-5, 2010 Jan.

Article in English | MEDLINE | ID: mdl-19942688

ABSTRACT

Peptidome is a public repository that archives and freely distributes tandem mass spectrometry peptide and protein identification data generated by the scientific community. Data from all stages of a mass spectrometry experiment are captured, including original mass spectra files, experimental metadata and conclusion-level results. The submission process is facilitated through acceptance of data in commonly used open formats, and all submissions undergo syntactic validation and curation in an effort to uphold data integrity and quality. Peptidome is not restricted to specific organisms, instruments or experiment types; data from any tandem mass spectrometry experiment from any species are accepted. In addition to data storage, web-based interfaces are available to help users query, browse and explore individual peptides, proteins or entire Samples and Studies. Results are integrated and linked with other NCBI resources to ensure dissemination of the information beyond the mass spectroscopy proteomics community. Peptidome is freely accessible at http://www.ncbi.nlm.nih.gov/peptidome.

Subject(s)

Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Databases, Protein , Mass Spectrometry/methods , Proteomics/methods , Computational Biology/trends , Gene Expression Profiling , Humans , Information Storage and Retrieval/methods , Internet , National Library of Medicine (U.S.) , Peptides/chemistry , Protein Structure, Tertiary , Software , United States

Composition of the synaptic PSD-95 complex.

Dosemeci, Ayse; Makusky, Anthony J; Jankowska-Stephens, Ewa; Yang, Xiaoyu; Slotta, Douglas J; Markey, Sanford P.

Mol Cell Proteomics ; 6(10): 1749-60, 2007 Oct.

Article in English | MEDLINE | ID: mdl-17623647

ABSTRACT

Postsynaptic density protein 95 (PSD-95), a specialized scaffold protein with multiple protein interaction domains, forms the backbone of an extensive postsynaptic protein complex that organizes receptors and signal transduction molecules at the synaptic contact zone. Large, detergent-insoluble PSD-95-based postsynaptic complexes can be affinity-purified from conventional PSD fractions using magnetic beads coated with a PSD-95 antibody. In the present study purified PSD-95 complexes were analyzed by LC/MS/MS. A semiquantitative measure of the relative abundances of proteins in the purified PSD-95 complexes and the parent PSD fraction was estimated based on the cumulative ion current intensities of corresponding peptides. The affinity-purified preparation was largely depleted of presynaptic proteins, spectrin, intermediate filaments, and other contaminants prominent in the parent PSD fraction. We identified 525 of the proteins previously reported in parent PSD fractions, but only 288 of these were detected after affinity purification. We discuss 26 proteins that are major components in the PSD-95 complex based upon abundance ranking and affinity co-purification with PSD-95. This subset represents a minimal list of constituent proteins of the PSD-95 complex and includes, in addition to the specialized scaffolds and N-methyl-d-aspartate (NMDA) receptors, an abundance of alpha-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors, small G-protein regulators, cell adhesion molecules, and hypothetical proteins. The identification of two Arf regulators, BRAG1 and BRAG2b, as co-purifying components of the complex implies pivotal functions in spine plasticity such as the reorganization of the actin cytoskeleton and insertion and retrieval of proteins to and from the plasma membrane. Another co-purifying protein (Q8BZM2) with two sterile alpha motif domains may represent a novel structural core element of the PSD.

Subject(s)

Nerve Tissue Proteins/analysis , Synapses/chemistry , Animals , Chromatography, Affinity , Electrophoresis, Polyacrylamide Gel , Nerve Tissue Proteins/isolation & purification , Rats , Rats, Sprague-Dawley

10.

Clustering mass spectrometry data using order statistics.

Slotta, Douglas J; Heath, Lenwood S; Ramakrishnan, Naren; Helm, Rich; Potts, Malcolm.

Proteomics ; 3(9): 1687-91, 2003 Sep.

Article in English | MEDLINE | ID: mdl-12973726

ABSTRACT

Mass spectrometry data is inherently uncertain. Rather than compare peak heights across samples, a comparison can be made of the relative ordering of the peak height across samples. Order statistics are used to provide a distance metric between each ordered list of peak heights from the samples. A principal component analysis is performed on the set of distance vectors to highlight to important components.

Subject(s)

Mass Spectrometry/statistics & numerical data , Peptides/analysis , Mass Spectrometry/methods , Principal Component Analysis

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL