Search | VHL Regional Portal

MOPED: Model Organism Protein Expression Database.

Kolker, Eugene; Higdon, Roger; Haynes, Winston; Welch, Dean; Broomall, William; Lancet, Doron; Stanberry, Larissa; Kolker, Natali.

Nucleic Acids Res ; 40(Database issue): D1093-9, 2012 Jan.

Article in English | MEDLINE | ID: mdl-22139914

ABSTRACT

Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43,000 proteins with at least one spectral match and more than 11 million high certainty spectra.

Subject(s)

Databases, Protein , Proteins/metabolism , Animals , Humans , Mass Spectrometry , Mice , Models, Animal , Proteomics , User-Computer Interface

Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins.

Kolker, Natali; Higdon, Roger; Broomall, William; Stanberry, Larissa; Welch, Dean; Lu, Wei; Haynes, Winston; Barga, Roger; Kolker, Eugene.

OMICS ; 15(7-8): 513-21, 2011.

Article in English | MEDLINE | ID: mdl-21809957

ABSTRACT

To address the monumental challenge of assigning function to millions of sequenced proteins, we completed the first of a kind all-versus-all sequence alignments using BLAST for 9.9 million proteins in the UniRef100 database. Microsoft Windows Azure produced over 3 billion filtered records in 6 days using 475 eight-core virtual machines. Protein classification into functional groups was then performed using Hive and custom jars implemented on top of Apache Hadoop utilizing the MapReduce paradigm. First, using the Clusters of Orthologous Genes (COG) database, a length normalized bit score (LNBS) was determined to be the best similarity measure for classification of proteins. LNBS achieved sensitivity and specificity of 98% each. Second, out of 5.1 million bacterial proteins, about two-thirds were assigned to significantly extended COG groups, encompassing 30 times more assigned proteins. Third, the remaining proteins were classified into protein functional groups using an innovative implementation of a single-linkage algorithm on an in-house Hadoop compute cluster. This implementation significantly reduces the run time for nonindexed queries and optimizes efficient clustering on a large scale. The performance was also verified on Amazon Elastic MapReduce. This clustering assigned nearly 2 million proteins to approximately half a million different functional groups. A similar approach was applied to classify 2.8 million eukaryotic sequences resulting in over 1 million proteins being assign to existing KOG groups and the remainder clustered into 100,000 functional groups.

Subject(s)

Proteins/classification , Databases, Protein , Proteins/chemistry , Proteins/metabolism

SPIRE: Systematic protein investigative research environment.

Kolker, Eugene; Higdon, Roger; Morgan, Phil; Sedensky, Margaret; Welch, Dean; Bauman, Andrew; Stewart, Elizabeth; Haynes, Winston; Broomall, William; Kolker, Natali.

J Proteomics ; 75(1): 122-6, 2011 Dec 10.

Article in English | MEDLINE | ID: mdl-21609792

ABSTRACT

The SPIRE (Systematic Protein Investigative Research Environment) provides web-based experiment-specific mass spectrometry (MS) proteomics analysis (https://www.proteinspire.org). Its emphasis is on usability and integration of the best analytic tools. SPIRE provides an easy to use web-interface and generates results in both interactive and simple data formats. In contrast to run-based approaches, SPIRE conducts the analysis based on the experimental design. It employs novel methods to generate false discovery rates and local false discovery rates (FDR, LFDR) and integrates the best and complementary open-source search and data analysis methods. The SPIRE approach of integrating X!Tandem, OMSSA and SpectraST can produce an increase in protein IDs (52-88%) over current combinations of scoring and single search engines while also providing accurate multi-faceted error estimation. One of SPIRE's primary assets is combining the results with data on protein function, pathways and protein expression from model organisms. We demonstrate some of SPIRE's capabilities by analyzing mitochondrial proteins from the wild type and 3 mutants of C. elegans. SPIRE also connects results to publically available proteomics data through its Model Organism Protein Expression Database (MOPED). SPIRE can also provide analysis and annotation for user supplied protein ID and expression data.

Subject(s)

Databases, Protein , Models, Biological , Proteomics/methods , Systems Biology/methods , Animals , Caenorhabditis elegans/metabolism , Caenorhabditis elegans Proteins/analysis , Caenorhabditis elegans Proteins/chemistry , Caenorhabditis elegans Proteins/genetics , Mass Spectrometry/methods , Mitochondria/metabolism , Mitochondrial Proteins/analysis , Mitochondrial Proteins/chemistry , Mitochondrial Proteins/genetics , User-Computer Interface

Bioinformatics and data-intensive scientific discovery in the beginning of the 21st century.

Barga, Roger; Howe, Bill; Beck, David; Bowers, Stuart; Dobyns, William; Haynes, Winston; Higdon, Roger; Howard, Chris; Roth, Christian; Stewart, Elizabeth; Welch, Dean; Kolker, Eugene.

OMICS ; 15(4): 199-201, 2011 Apr.

Article in English | MEDLINE | ID: mdl-21476840

ABSTRACT

This article is a summary of the bioinformatics issues and challenges of data-intensive science as discussed in the NSF-funded Data-Intensive Science (DIS) workshop in Seattle, September 19-20, 2010.

Subject(s)

Biological Science Disciplines/methods , Computational Biology/methods , Computational Biology/trends

Technology and data-intensive science in the beginning of the 21st century.

Bernstein, Philip A; Wecker, Dave; Krishnamurthy, Ashok; Manocha, Dinesh; Gardner, Jeffrey; Kolker, Natali; Reschke, Chance; Stombaugh, Jesse; Vagata, Pamela; Stewart, Elizabeth; Welch, Dean; Kolker, Eugene.

OMICS ; 15(4): 203-7, 2011 Apr.

Article in English | MEDLINE | ID: mdl-21476841

ABSTRACT

This article is a summary of the technology issues and challenges of data-intensive science and cloud computing as discussed in the Data-Intensive Science (DIS) workshop in Seattle, September 19-20, 2010.

Subject(s)

Biological Science Disciplines/methods , Technology/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL