Pesquisa | Portal Regional da BVS (teste)

Partitioning protein structures into domains: why is it so difficult?

Holland, Timothy A; Veretnik, Stella; Shindyalov, Ilya N; Bourne, Philip E.

J Mol Biol ; 361(3): 562-90, 2006 Aug 18.

Artigo em Inglês | MEDLINE | ID: mdl-16863650

RESUMO

This analysis takes an in-depth look into the difficulties encountered by automatic methods for domain decomposition from three-dimensional structure. The analysis involves a multi-faceted set of criteria including the integrity of secondary structure elements, the tendency toward fragmentation of domains, domain boundary consistency and topology. The strength of the analysis comes from the use of a new comprehensive benchmark dataset, which is based on consensus among experts (CATH, SCOP and AUTHORS of the 3D structures) and covers 30 distinct architectures and 211 distinct topologies as defined by CATH. Furthermore, over 66% of the structures are multi-domain proteins; each domain combination occurring once per dataset. The performance of four automatic domain assignment methods, DomainParser, NCBI, PDP and PUU, is carefully analyzed using this broad spectrum of topology combinations and knowledge of rules and assumptions built into each algorithm. We conclude that it is practically impossible for an automatic method to achieve the level of performance of human experts. However, we propose specific improvements to automatic methods as well as broadening the concept of a structural domain. Such work is prerequisite for establishing improved approaches to domain recognition. (The benchmark dataset is available from http://pdomains.sdsc.edu).

Assuntos

Simulação por Computador , Modelos Moleculares , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Biologia Computacional

DMAPS: a database of multiple alignments for protein structures.

Guda, Chittibabu; Pal, Lipika R; Shindyalov, Ilya N.

Nucleic Acids Res ; 34(Database issue): D273-6, 2006 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-16381863

RESUMO

The database of multiple alignments for protein structures (DMAPS) provides instant access to pre-computed multiple structure alignments for all protein structure families in the Protein Data Bank (PDB). Protein structure families have been obtained from four distinct classification methods including SCOP, CATH, ENZYME and CE, and multiple structure alignments have been built for all families containing at least three members, using CE-MC software. Currently, multiple structure alignments are available for 3050 SCOP-, 3087 CATH-, 664 ENZYME- and 1707 CE-based families. A web-based query system has been developed to retrieve multiple alignments for these families using the PDB chain ID of any member of a family. Multiple alignments can be viewed or downloaded in six different formats, including JOY/html, TEXT, FASTA, PDB (superimposed coordinates), JOY/postscript and JOY/rtf. DMAPS is accessible online at http://bioinformatics.albany.edu/~dmaps.

Assuntos

Bases de Dados de Proteínas , Homologia Estrutural de Proteína , Internet , Estrutura Terciária de Proteína , Proteínas/química , Interface Usuário-Computador

Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology.

Ponomarenko, Julia V; Bourne, Philip E; Shindyalov, Ilya N.

Proteins ; 58(4): 855-65, 2005 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-15645518

RESUMO

Accompanying the discovery of an increasing number of proteins, there is the need to provide functional annotation that is both highly accurate and consistent. The Gene Ontology (GO) provides consistent annotation in a computer readable and usable form; hence, GO annotation (GOA) has been assigned to a large number of protein sequences based on direct experimental evidence and through inference determined by sequence homology. Here we show that this annotation can be extended and corrected for cases where protein structures are available. Specifically, using the Combinatorial Extension (CE) algorithm for structure comparison, we extend the protein annotation currently provided by GOA at the European Bioinformatics Institute (EBI) to further describe the contents of the Protein Data Bank (PDB). Specific cases of biologically interesting annotations derived by this method are given. Given that the relationship between sequence, structure, and function is complicated, we explore the impact of this relationship on assigning GOA. The effect of superfolds (folds with many functions) is considered and, by comparison to the Structural Classification of Proteins (SCOP), the individual effects of family, superfamily, and fold.

Assuntos

Biologia Computacional/métodos , Proteínas/química , Proteômica/métodos , Algoritmos , Antígenos/química , Análise por Conglomerados , Bases de Dados como Assunto , Bases de Dados Factuais , Bases de Dados de Proteínas , Imageamento Tridimensional , Armazenamento e Recuperação da Informação , Modelos Biológicos , Modelos Estatísticos , Peptídeos/química , Ligação Proteica , Conformação Proteica , Dobramento de Proteína , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes , Análise de Sequência de Proteína , Homologia de Sequência , Software , Relação Estrutura-Atividade , Terminologia como Assunto

CE-MC: a multiple protein structure alignment server.

Guda, Chittibabu; Lu, Sifang; Scheeff, Eric D; Bourne, Philip E; Shindyalov, Ilya N.

Nucleic Acids Res ; 32(Web Server issue): W100-3, 2004 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-15215359

RESUMO

CE-MC server (http://cemc.sdsc.edu) provides a web-based facility for the alignment of multiple protein structures based on C-alpha coordinate distances, using combinatorial extension (CE) and Monte Carlo (MC) optimization methods. Alignments are possible for user-selected PDB (Protein Data Bank) chains as well as for user-uploaded structures or the combination of the two. The whole process of generating multiple structure alignments involves three distinct steps, i.e. all-to-all pairwise alignment using the CE algorithm, iterative global optimization of a multiple alignment using the MC algorithm and formatting MC results using the JOY program. The server can be used to get multiple alignments for up to 25 protein structural chains with the flexibility of uploading multiple coordinate files and performing multiple structure alignment for user-selected PDB chains. For large-scale jobs and local installation of the CE-MC program, users can download the source code and precompiled binaries from the web server.

Assuntos

Estrutura Secundária de Proteína , Software , Algoritmos , Internet , Design de Software

Toward consistent assignment of structural domains in proteins.

Veretnik, Stella; Bourne, Philip E; Alexandrov, Nickolai N; Shindyalov, Ilya N.

J Mol Biol ; 339(3): 647-78, 2004 Jun 04.

Artigo em Inglês | MEDLINE | ID: mdl-15147847

RESUMO

The assignment of protein domains from three-dimensional structure is critically important in understanding protein evolution and function, yet little quality assurance has been performed. Here, the differences in the assignment of structural domains are evaluated using six common assignment methods. Three human expert methods (AUTHORS (authors' annotation), CATH and SCOP) and three fully automated methods (DALI, DomainParser and PDP) are investigated by analysis of individual methods against the author's assignment as well as analysis based on the consensus among groups of methods (only expert, only automatic, combined). The results demonstrate that caution is recommended in using current domain assignments, and indicates where additional work is needed. Specifically, the major factors responsible for conflicting domain assignments between methods, both experts and automatic, are: (1) the definition of very small domains; (2) splitting secondary structures between domains; (3) the size and number of discontinuous domains; (4) closely packed or convoluted domain-domain interfaces; (5) structures with large and complex architectures; and (6) the level of significance placed upon structural, functional and evolutionary concepts in considering structural domain definitions. A web-based resource that focuses on the results of benchmarking and the analysis of domain assignments is available at

Assuntos

Proteínas/química , Algoritmos , Modelos Moleculares , Conformação Proteica

A new scoring function and associated statistical significance for structure alignment by CE.

Jia, Yuting; Dewey, T Gregory; Shindyalov, Ilya N; Bourne, Philip E.

J Comput Biol ; 11(5): 787-99, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-15700402

RESUMO

A new scoring function for assessing the statistical significance of protein structure alignment has been developed. The new scores were tested empirically using the combinatorial extension (CE) algorithm. The significance of a given score was given a p-value by curve-fitting the distribution of the scores generated by a random comparison of proteins taken from the PDB_SELECT database and the structural classification of proteins (SCOP) database. Although the scoring function was developed based on the CE algorithm, it is portable to any other protein structure alignment algorithm. The new scoring function is examined by sensitivity, specificity, and ROC curves.

Assuntos

Biologia Computacional , Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína , Algoritmos , Interpretação Estatística de Dados , Software

A comparative proteomics resource: proteins of Arabidopsis thaliana.

Li, Wilfred W; Quinn, Greg B; Alexandrov, Nickolai N; Bourne, Philip E; Shindyalov, Ilya N.

Genome Biol ; 4(8): R51, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-12914659

RESUMO

Using an integrative genome annotation pipeline (iGAP) for proteome-wide protein structure and functional domain assignment, we analyzed all the proteins of Arabidopsis thaliana. Three-dimensional structures at the level of the domain are assigned by fold recognition and threading based on a novel fold library that extends common domain classifications. iGAP is being applied to proteins from all available proteomes as part of a comparative proteomics resource. The database is accessible from the web.

Assuntos

Proteínas de Arabidopsis/genética , Arabidopsis/genética , Genoma de Planta , Proteômica/métodos , Proteínas de Arabidopsis/classificação , Proteoma/genética , Proteômica/classificação , Software

Building an automated classification of DNA-binding protein domains.

Ponomarenko, Julia V; Bourne, Philip E; Shindyalov, Ilya N.

Bioinformatics ; 18 Suppl 2: S192-201, 2002.

Artigo em Inglês | MEDLINE | ID: mdl-12386003

RESUMO

Intensive growth in 3D structure data on DNA-protein complexes as reflected in the Protein Data Bank (PDB) demands new approaches to the annotation and characterization of these data and will lead to a new understanding of critical biological processes involving these data. These data and those from other protein structure classifications will become increasingly important for the modeling of complete proteomes. We propose a fully automated classification of DNA-binding protein domains based on existing 3D-structures from the PDB. The classification, by domain, relies on the Protein Domain Parser (PDP) and the Combinatorial Extension (CE) algorithm for structural alignment. The approach involves the analysis of 3D-interaction patterns in DNA-protein interfaces, assignment of structural domains interacting with DNA, clustering of domains based on structural similarity and DNA-interacting patterns. Comparison with existing resources on describing structural and functional classifications of DNA-binding proteins was used to validate and improve the approach proposed here. In the course of our study we defined a set of criteria and heuristics allowing us to automatically build a biologically meaningful classification and define classes of functionally related protein domains. It was shown that taking into consideration interactions between protein domains and DNA considerably improves the classification accuracy. Our approach provides a high-throughput and up-to-date annotation of DNA-binding protein families which can be found at http://spdc.sdsc.edu.

Assuntos

Inteligência Artificial , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/classificação , DNA/química , Modelos Químicos , Análise de Sequência de Proteína/métodos , Análise de Sequência/métodos , Sítios de Ligação , Simulação por Computador , DNA/análise , DNA/classificação , Proteínas de Ligação a DNA/análise , Bases de Dados de Proteínas , Modelos Moleculares , Ligação Proteica , Estrutura Terciária de Proteína

10.

CKAAPs DB: a Conserved Key Amino Acid Positions DataBase.

Li, Wilfred W; Reddy, Boojala V B; Tate, John G; Shindyalov, Ilya N; Bourne, Philip E.

Nucleic Acids Res ; 30(1): 409-11, 2002 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-11752351

RESUMO

The Conserved Key Amino Acid Positions DataBase (CKAAPs DB) provides access to an analysis of structurally similar proteins with dissimilar sequences where key residues within a common fold are identified. CKAAPs may be important in protein folding and structural stability and function, and hence useful for protein engineering studies. This paper provides an update to the initial report of CKAAPs DB [Li et al. (2001) Nucleic Acids Res., 29, 329-331]. CKAAPs DB contains CKAAPs for the representative set of polypeptide chains derived from the CE and FSSP databases, as well as subdomains (conserved regions of the order of 100 residues within a domain) identified by CE. The new version now offers different perspectives on the CKAAPs. First, CKAAPs are mapped onto their respective Protein Data Bank (PDB) structures rendered by Molscript, providing a spatial context for the CKAAPs. Secondly, CKAAPs may be highlighted within a structure-based sequence alignment, as well as secondary structure alignment. Thirdly, the resulting sequence homologs from the structure alignment may be viewed in alignments colorized based on identities and property groups using Mview. New search capabilities have also been provided for searching by keyword combinations, PDB IDs, EC numbers, GI numbers, LocusLink ID, taxonomy, gene ontology and pathways. A new custom CKAAPs analysis interface has been implemented where a user may change the criteria for inclusion of chains, initiate CKAAPs analysis and retrieve results. CKAAPs DB is accessible through the web at http://ckaaps.sdsc.edu/. Plain text analysis results are available by FTP at ftp://ftp.sdsc.edu/pub/sdsc/biology/ckaap.

Assuntos

Sequência Conservada , Bases de Dados de Proteínas , Sequência de Aminoácidos , Animais , Armazenamento e Recuperação da Informação , Internet , Peptídeos/química , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/química , Alinhamento de Sequência , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA