Pesquisa | Portal Regional da BVS (teste)

Protannotator: a semiautomated pipeline for chromosome-wise functional annotation of the "missing" human proteome.

Islam, Mohammad T; Garg, Gagan; Hancock, William S; Risk, Brian A; Baker, Mark S; Ranganathan, Shoba.

J Proteome Res ; 13(1): 76-83, 2014 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-24313344

RESUMO

The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20,128 proteins for the human proteome, of which 3831 human proteins (â¼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).

Assuntos

Automação , Cromossomos Humanos , Proteoma , Bases de Dados de Proteínas , Humanos , Software

A peptide-spectrum scoring system based on ion alignment, intensity, and pair probabilities.

Risk, Brian A; Edwards, Nathan J; Giddings, Morgan C.

J Proteome Res ; 12(9): 4240-7, 2013 Sep 06.

Artigo em Inglês | MEDLINE | ID: mdl-23875887

RESUMO

Peppy, the proteogenomic/proteomic search software, employs a novel method for assessing the match quality between an MS/MS spectrum and a theorized peptide sequence. The scoring system uses three score factors calculated with binomial probabilities: the probability that a fragment ion will randomly align with a peptide ion, the probability that the aligning ions will be selected from subsets of the most intense peaks, and the probability that the intensities of fragment ions identified as y-ions are greater than those of their counterpart b-ions. The scores produced by the method act as global confidence scores, which facilitate the accurate comparison of results and the estimation of false discovery rates. Peppy has been integrated into the meta-search engine PepArML to produce meaningful comparisons with Mascot, MSGF+, OMSSA, X!Tandem, k-Score and s-Score. For two of the four data sets examined with the PepArML analysis, Peppy exceeded the accuracy performance of the other scoring systems. Peppy is available for download at http://geneffects.com/peppy .

Assuntos

Mapeamento de Peptídeos , Software , Algoritmos , Sequência de Aminoácidos , Proteínas Sanguíneas/química , Humanos , Dados de Sequência Molecular , Fragmentos de Peptídeos/química , Análise de Sequência de Proteína , Espectrometria de Massas em Tandem

Peppy: proteogenomic search software.

Risk, Brian A; Spitzer, Wendy J; Giddings, Morgan C.

J Proteome Res ; 12(6): 3019-25, 2013 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-23614390

RESUMO

Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the use of several unintegrated tools. Furthermore, many proteogenomic efforts have been limited to small genomes, as large genomes can prove impractical due to the required amount of computer memory and computation time. We present Peppy, a software tool designed to perform every necessary task of proteogenomic searches quickly, accurately and automatically. The software generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns confidence values to those matches. Peppy automatically performs a decoy database generation, search and analysis to return identifications at the desired false discovery rate threshold. Written in Java for cross-platform execution, the software is fully multithreaded for enhanced speed. The program can run on regular desktop computers, opening the doors of proteogenomic searching to a wider audience of proteomics and genomics researchers. Peppy is available at http://geneffects.com/peppy .

Assuntos

Anotação de Sequência Molecular , Fragmentos de Peptídeos/isolamento & purificação , Proteínas/isolamento & purificação , Proteômica , Software , Algoritmos , Sequência de Aminoácidos , Sequência de Bases , Linhagem Celular , Bases de Dados de Proteínas , Humanos , Dados de Sequência Molecular , Espectrometria de Massas em Tandem

Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.

Khatun, Jainab; Yu, Yanbao; Wrobel, John A; Risk, Brian A; Gunawardena, Harsha P; Secrest, Ashley; Spitzer, Wendy J; Xie, Ling; Wang, Li; Chen, Xian; Giddings, Morgan C.

BMC Genomics ; 14: 141, 2013 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-23448259

RESUMO

BACKGROUND: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome. RESULTS: We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt. CONCLUSIONS: The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.

Assuntos

Bases de Dados Genéticas , Genoma Humano , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética , Linhagem Celular , Mapeamento Cromossômico , Biologia Computacional , Humanos , Espectrometria de Massas , Análise de Sequência de DNA

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA