Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Methods ; 13(5): 425-30, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27043882

RESUMO

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.


Assuntos
Biologia Computacional/normas , Genômica/normas , Filogenia , Proteômica/normas , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Eucariotos/classificação , Eucariotos/genética , Ontologia Genética , Genômica/métodos , Modelos Genéticos , Proteômica/métodos , Análise de Sequência de Proteína , Homologia de Sequência , Especificidade da Espécie
2.
Nucleic Acids Res ; 41(Web Server issue): W242-8, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23685612

RESUMO

The PhyloFacts 'Fast Approximate Tree Classification' (FAT-CAT) web server provides a novel approach to ortholog identification using subtree hidden Markov model-based placement of protein sequences to phylogenomic orthology groups in the PhyloFacts database. Results on a data set of microbial, plant and animal proteins demonstrate FAT-CAT's high precision at separating orthologs and paralogs and robustness to promiscuous domains. We also present results documenting the precision of ortholog identification based on subtree hidden Markov model scoring. The FAT-CAT phylogenetic placement is used to derive a functional annotation for the query, including confidence scores and drill-down capabilities. PhyloFacts' broad taxonomic and functional coverage, with >7.3 M proteins from across the Tree of Life, enables FAT-CAT to predict orthologs and assign function for most sequence inputs. Four pipeline parameter presets are provided to handle different sequence types, including partial sequences and proteins containing promiscuous domains; users can also modify individual parameters. PhyloFacts trees matching the query can be viewed interactively online using the PhyloScope Javascript tree viewer and are hyperlinked to various external databases. The FAT-CAT web server is available at http://phylogenomics.berkeley.edu/phylofacts/fatcat/.


Assuntos
Filogenia , Proteínas/classificação , Software , Animais , Classificação/métodos , Internet , Cadeias de Markov , Anotação de Sequência Molecular , Proteínas/genética , Proteínas/fisiologia , Análise de Sequência de Proteína
3.
Protein Sci ; 21(6): 769-85, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22528593

RESUMO

Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.


Assuntos
Evolução Molecular , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Animais , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Dobramento de Proteína , RNA Mensageiro/genética , Alinhamento de Sequência
4.
Biochemistry ; 51(11): 2265-75, 2012 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-22324760

RESUMO

Pyrroloquinoline quinone (PQQ) is a small, redox active molecule that serves as a cofactor for several bacterial dehydrogenases, introducing pathways for carbon utilization that confer a growth advantage. Early studies had implicated a ribosomally translated peptide as the substrate for PQQ production. This study presents a sequence- and structure-based analysis of the components of the pqq operon. We find the necessary components for PQQ production are present in 126 prokaryotes, most of which are Gram-negative and a number of which are pathogens. A total of five gene products, PqqA, PqqB, PqqC, PqqD, and PqqE, are identified as being obligatory for PQQ production. Three of the gene products in the pqq operon, PqqB, PqqC, and PqqE, are members of large protein superfamilies. By combining evolutionary conservation patterns with information from three-dimensional structures, we are able to differentiate the gene products involved in PQQ biosynthesis from those with divergent functions. The observed persistence of a conserved gene order within analyzed operons strongly suggests a role for protein-protein interactions in the course of cofactor biosynthesis. These studies propose previously unidentified roles for several of the gene products, as well as identifying possible new targets for antibiotic design and application.


Assuntos
Proteínas de Bactérias/genética , Genes Bacterianos , Klebsiella pneumoniae/metabolismo , Cofator PQQ/biossíntese , Cofator PQQ/genética , Sequência de Aminoácidos , Proteínas de Bactérias/metabolismo , Modelos Moleculares , Dados de Sequência Molecular , Óperon , Filogenia , Conformação Proteica
5.
Brief Bioinform ; 12(5): 413-22, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21712343

RESUMO

Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area.


Assuntos
Biologia Computacional/métodos , Genoma , Filogenia , Animais , Bases de Dados Factuais , Evolução Molecular , Humanos , Proteínas/química
6.
Nucleic Acids Res ; 39(Database issue): D465-74, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21097780

RESUMO

ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modeller/). ModBase currently contains 10,355,444 reliable models for domains in 2,421,920 unique protein sequences. ModBase allows users to update comparative models on demand, and request modeling of additional sequences through an interface to the ModWeb modeling server (http://salilab.org/modweb). ModBase models are available through the ModBase interface as well as the Protein Model Portal (http://www.proteinmodelportal.org/). Recently developed associated resources include the SALIGN server for multiple sequence and structure alignment (http://salilab.org/salign), the ModEval server for predicting the accuracy of protein structure models (http://salilab.org/modeval), the PCSS server for predicting which peptides bind to a given protein (http://salilab.org/pcss) and the FoXS server for calculating and fitting Small Angle X-ray Scattering profiles (http://salilab.org/foxs).


Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas de Bactérias/química , Gráficos por Computador , Peptídeos/química , Mapeamento de Interação de Proteínas , Proteínas/química , Espalhamento a Baixo Ângulo , Alinhamento de Sequência , Software , Homologia Estrutural de Proteína , Interface Usuário-Computador , Difração de Raios X
7.
PLoS One ; 5(7): e11688, 2010 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-20657737

RESUMO

A significant fraction of a plant's nuclear genome encodes chloroplast-targeted proteins, many of which are devoted to the assembly and function of the photosynthetic apparatus. Using digital video imaging of chlorophyll fluorescence, we isolated proton gradient regulation 7 (pgr7) as an Arabidopsis thaliana mutant with low nonphotochemical quenching of chlorophyll fluorescence (NPQ). In pgr7, the xanthophyll cycle and the PSBS gene product, previously identified NPQ factors, were still functional, but the efficiency of photosynthetic electron transport was lower than in the wild type. The pgr7 mutant was also smaller in size and had lower chlorophyll content than the wild type in optimal growth conditions. Positional cloning located the pgr7 mutation in the At3g21200 (PGR7) gene, which was predicted to encode a chloroplast protein of unknown function. Chloroplast targeting of PGR7 was confirmed by transient expression of a GFP fusion protein and by stable expression and subcellular localization of an epitope-tagged version of PGR7. Bioinformatic analyses revealed that the PGR7 protein has two domains that are conserved in plants, algae, and bacteria, and the N-terminal domain is predicted to bind a cofactor such as FMN. Thus, we identified PGR7 as a novel, conserved nuclear gene that is necessary for efficient photosynthetic electron transport in chloroplasts of Arabidopsis.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Transporte de Elétrons/fisiologia , Proteínas de Fluorescência Verde/metabolismo , Fotossíntese/fisiologia , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Clorofila/metabolismo , Biologia Computacional , Transporte de Elétrons/genética , Proteínas de Fluorescência Verde/genética , Immunoblotting , Fenótipo , Fotossíntese/genética , Filogenia , Plantas Geneticamente Modificadas/genética , Plantas Geneticamente Modificadas/metabolismo
8.
Nucleic Acids Res ; 38(Web Server issue): W29-34, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20430824

RESUMO

We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.


Assuntos
Filogenia , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína , Software , Algoritmos , Internet , Cadeias de Markov , Estrutura Terciária de Proteína
9.
PLoS Comput Biol ; 6(1): e1000621, 2010 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-20126522

Assuntos
Genômica , Filogenia
10.
Bioinformatics ; 26(5): 617-24, 2010 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-20080507

RESUMO

MOTIVATION: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites. RESULTS: In cross-validation experiments on two benchmark datasets from the Catalytic Site Atlas and CATRES resources containing a total of 437 manually curated enzymes spanning 487 SCOP families, Discern increases catalytic site recall between 12% and 20% over methods that combine information from both sequence and structure, and by >or=50% over methods that make use of sequence conservation signal only. Controlled experiments show that Discern's improvement in catalytic residue prediction is derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting.


Assuntos
Domínio Catalítico/genética , Evolução Molecular , Conformação Proteica , Proteínas/química , Proteômica/métodos , Sítios de Ligação , Catálise , Bases de Dados de Proteínas , Modelos Moleculares , Dobramento de Proteína , Análise de Sequência de Proteína
11.
BMC Bioinformatics ; 10: 197, 2009 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-19558703

RESUMO

BACKGROUND: Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed. RESULTS: We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA). CONCLUSION: ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.


Assuntos
Biologia Computacional/métodos , Enzimas/química , Software , Sítios de Ligação , Catálise , Bases de Dados de Proteínas
12.
Nucleic Acids Res ; 37(Web Server issue): W84-9, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19435885

RESUMO

Ortholog detection is essential in functional annotation of genomes, with applications to phylogenetic tree construction, prediction of protein-protein interaction and other bioinformatics tasks. We present here the PHOG web server employing a novel algorithm to identify orthologs based on phylogenetic analysis. Results on a benchmark dataset from the TreeFam-A manually curated orthology database show that PHOG provides a combination of high recall and precision competitive with both InParanoid and OrthoMCL, and allows users to target different taxonomic distances and precision levels through the use of tree-distance thresholds. For instance, OrthoMCL-DB achieved 76% recall and 66% precision on this dataset; at a slightly higher precision (68%) PHOG achieves 10% higher recall (86%). InParanoid achieved 87% recall at 24% precision on this dataset, while a PHOG variant designed for high recall achieves 88% recall at 61% precision, increasing precision by 37% over InParanoid. PHOG is based on pre-computed trees in the PhyloFacts resource, and contains over 366 K orthology groups with a minimum of three species. Predicted orthologs are linked to GO annotations, pathway information and biological literature. The PHOG web server is available at http://phylofacts.berkeley.edu/orthologs/.


Assuntos
Filogenia , Software , Algoritmos , Animais , Humanos , Internet , Camundongos , Reprodutibilidade dos Testes , Análise de Sequência de Proteína , Interface Usuário-Computador
13.
Nucleic Acids Res ; 37(Web Server issue): W390-5, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19443452

RESUMO

We present the INTREPID web server for predicting functionally important residues in proteins. INTREPID has been shown to boost the recall and precision of catalytic residue prediction over other sequence-based methods and can be used to identify other types of functional residues. The web server takes an input protein sequence, gathers homologs, constructs a multiple sequence alignment and phylogenetic tree and finally runs the INTREPID method to assign a score to each position. Residues predicted to be functionally important are displayed on homologous 3D structures (where available), highlighting spatial patterns of conservation at various significance thresholds. The INTREPID web server is available at http://phylogenomics.berkeley.edu/intrepid.


Assuntos
Proteínas/química , Software , Aminoácidos/química , Domínio Catalítico , Internet , Modelos Moleculares , Filogenia , Conformação Proteica , Proteínas/classificação , Proteínas/genética , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Interface Usuário-Computador
14.
Bioinformatics ; 24(21): 2445-52, 2008 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-18776193

RESUMO

MOTIVATION: Identification of functionally important residues in proteins plays a significant role in biological discovery. Here, we present INTREPID--an information-theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments (MSAs). INTREPID uses a traversal of the phylogeny in combination with a positional conservation score, based on Jensen-Shannon divergence, to rank positions in an MSA. While knowledge of protein 3D structure can significantly improve the accuracy of functional site identification, since structural information is not available for a majority of proteins, INTREPID relies solely on sequence information. We evaluated INTREPID on two tasks: predicting catalytic residues and predicting specificity determinants. RESULTS: In catalytic residue prediction, INTREPID provides significant improvements over Evolutionary Trace, ConSurf as well as over a baseline global conservation method on a set of 100 manually curated enzymes from the Catalytic Site Atlas. In particular, INTREPID is able to better predict catalytic positions that are not globally conserved and hence, attains improved sensitivity at high values of specificity. We also investigated the performance of INTREPID as a function of the evolutionary divergence of the protein family. We found that INTREPID is better able to exploit the diversity in such families and that accuracy improves when homologs with very low sequence identity are included in an alignment. In specificity determinant prediction, when subtype information is known, INTREPID-SPEC, a variant of INTREPID, attains accuracies that are competitive with other approaches for this task. AVAILABILITY: INTREPID is available for 16919 families in the PhyloFacts resource (http://phylogenomics.berkeley.edu/phylofacts).


Assuntos
Algoritmos , Proteínas/química , Sítios de Ligação , Bases de Dados de Proteínas , Conformação Proteica , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Software
15.
Nucleic Acids Res ; 36(Database issue): D943-6, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17933772

RESUMO

The Generation Challenge Programme (GCP; www.generationcp.org) has developed an online resource documenting stress-responsive genes comparatively across plant species. This public resource is a compendium of protein families, phylogenetic trees, multiple sequence alignments (MSA) and associated experimental evidence. The central objective of this resource is to elucidate orthologous and paralogous relationships between plant genes that may be involved in response to environmental stress, mainly abiotic stresses such as water deficit ('drought'). The web-based graphical user interface (GUI) of the resource includes query and visualization tools that allow diverse searches and browsing of the underlying project database. The web interface can be accessed at http://dayhoff.generationcp.org.


Assuntos
Produtos Agrícolas/genética , Bases de Dados Genéticas , Genes de Plantas , Produtos Agrícolas/metabolismo , Desidratação , Meio Ambiente , Perfilação da Expressão Gênica , Internet , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/classificação , Alinhamento de Sequência , Interface Usuário-Computador
16.
PLoS Comput Biol ; 3(8): e160, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17708678

RESUMO

Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at http://phylogenomics.berkeley.edu/SCI-PHY/ allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to provide their own subfamily definitions can do so. Source code is available on the Web page. The Berkeley Phylogenomics Group PhyloFacts resource contains pre-calculated subfamily predictions and subfamily HMMs for more than 40,000 protein families and domains at http://phylogenomics.berkeley.edu/phylofacts/.


Assuntos
Algoritmos , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Cadeias de Markov , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
17.
Nucleic Acids Res ; 35(Web Server issue): W27-32, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17488835

RESUMO

Phylogenomic analysis addresses the limitations of function prediction based on annotation transfer, and has been shown to enable the highest accuracy in prediction of protein molecular function. The Berkeley Phylogenomics Group provides a series of web servers for phylogenomic analysis: classification of sequences to pre-computed families and subfamilies using the PhyloFacts Phylogenomic Encyclopedia, FlowerPower clustering of proteins sharing the same domain architecture, MUSCLE multiple sequence alignment, SATCHMO simultaneous alignment and tree construction and SCI-PHY subfamily identification. The PhyloBuilder web server provides an integrated phylogenomic pipeline starting with a user-supplied protein sequence, proceeding to homolog identification, multiple alignment, phylogenetic tree construction, subfamily identification and structure prediction. The Berkeley Phylogenomics Group resources are available at http://phylogenomics.berkeley.edu.


Assuntos
Biologia Computacional/métodos , Filogenia , Algoritmos , Animais , Computadores , Bases de Dados Genéticas , Bases de Dados de Proteínas , Humanos , Internet , Modelos Genéticos , Conformação Proteica , Alinhamento de Sequência , Análise de Sequência de Proteína , Software , Interface Usuário-Computador
18.
BMC Evol Biol ; 7 Suppl 1: S12, 2007 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-17288570

RESUMO

BACKGROUND: Function prediction by transfer of annotation from the top database hit in a homology search has been shown to be prone to systematic error. Phylogenomic analysis reduces these errors by inferring protein function within the evolutionary context of the entire family. However, accuracy of function prediction for multi-domain proteins depends on all members having the same overall domain structure. By contrast, most common homolog detection methods are optimized for retrieving local homologs, and do not address this requirement. RESULTS: We present FlowerPower, a novel clustering algorithm designed for the identification of global homologs as a precursor to structural phylogenomic analysis. Similar to methods such as PSIBLAST, FlowerPower employs an iterative approach to clustering sequences. However, rather than using a single HMM or profile to expand the cluster, FlowerPower identifies subfamilies using the SCI-PHY algorithm and then selects and aligns new homologs using subfamily hidden Markov models. FlowerPower is shown to outperform BLAST, PSI-BLAST and the UCSC SAM-Target 2K methods at discrimination between proteins in the same domain architecture class and those having different overall domain structures. CONCLUSION: Structural phylogenomic analysis enables biologists to avoid the systematic errors associated with annotation transfer; clustering sequences based on sharing the same domain architecture is a critical first step in this process. FlowerPower is shown to consistently identify homologous sequences having the same domain architecture as the query. AVAILABILITY: FlowerPower is available as a webserver at http://phylogenomics.berkeley.edu/flowerpower/.


Assuntos
Algoritmos , Filogenia , Estrutura Terciária de Proteína , Proteínas/fisiologia , Análise de Sequência de Proteína/métodos , Animais , Análise por Conglomerados , Bases de Dados Genéticas , Humanos , Proteínas/classificação , Projetos de Pesquisa , Alinhamento de Sequência
19.
Genome Biol ; 7(9): R83, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16973001

RESUMO

The Berkeley Phylogenomics Group presents PhyloFacts, a structural phylogenomic encyclopedia containing almost 10,000 'books' for protein families and domains, with pre-calculated structural, functional and evolutionary analyses. PhyloFacts enables biologists to avoid the systematic errors associated with function prediction by homology through the integration of a variety of experimental data and bioinformatics methods in an evolutionary framework. Users can submit sequences for classification to families and functional subfamilies. PhyloFacts is available as a worldwide web resource from http://phylogenomics.berkeley.edu/phylofacts.


Assuntos
Bases de Dados de Proteínas , Proteínas , Animais , Evolução Molecular , Humanos , Filogenia , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Proteínas/genética , Relação Estrutura-Atividade
20.
OMICS ; 10(2): 231-7, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16901231

RESUMO

In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.


Assuntos
Filogenia , Padrões de Referência , Genômica/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...