Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Proteins ; 40(1): 71-85, 2000 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-10813832

RESUMO

Pairwise interaction models to recognize native folds are designed and analyzed. Different sets of parameters are considered but the focus was on 20 x 20 contact matrices. Simultaneous solution of inequalities and minimization of the variance of the energy find matrices that recognize exactly the native folds of 572 sequences and structures from the protein data bank (PDB). The set includes many homologous pairs, which present a difficult recognition problem. Significant recognition ability is recovered with a small number of parameters (e.g., the H/P model). However, full recognition requires a complete set of amino acids. In addition to structures from the PDB, a folding program (MONSSTER) was used to generate decoy structures for 75 proteins. It is impossible to recognize all the native structures of the extended set by contact potentials. We therefore searched for a new functional form. An energy function U, which is based on a sum of general pairwise interactions limited to a resolution of 1 angstrom, is considered. This set was infeasible too. We therefore conjecture that it is not possible to find a folding potential, resolved to 1 angstrom, which is a sum of pair interactions.


Assuntos
Dobramento de Proteína , Proteínas/química , Algoritmos , Aminoácidos/química , Bases de Dados Factuais , Modelos Moleculares , Proteínas/metabolismo , Software , Termodinâmica
2.
Nucleic Acids Res ; 28(1): 49-55, 2000 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-10592179

RESUMO

The ProtoMap site offers an exhaustive classification of all proteins in the SWISS-PROT database, into groups of related proteins. The classification is based on analysis of all pairwise similarities among protein sequences. The analysis makes essential use of transitivity to identify homologies among proteins. Within each group of the classification, every two members are either directly or transitively related. However, transitivity is applied restrictively in order to prevent unrelated proteins from clustering together. The classification is done at different levels of confidence, and yields a hierarchical organization of all proteins. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Many clusters contain protein sequences that are not classified by other databases. The hierarchical organization suggested by our analysis may help in detecting finer subfamilies in families of known proteins. In addition it brings forth interesting relationships between protein families, upon which local maps for the neighborhood of protein families can be sketched. The ProtoMap web server can be accessed at http://www.protomap.cs.huji.ac.il


Assuntos
Bases de Dados Factuais , Proteínas/genética , Gráficos por Computador , Armazenamento e Recuperação da Informação , Internet , Proteínas/química , Interface Usuário-Computador
3.
Proteins ; 37(3): 360-78, 1999 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-10591097

RESUMO

We investigate the space of all protein sequences in search of clusters of related proteins. Our aim is to automatically detect these sets, and thus obtain a classification of all protein sequences. Our analysis, which uses standard measures of sequence similarity as applied to an all-vs.-all comparison of SWISSPROT, gives a very conservative initial classification based on the highest scoring pairs. The many classes in this classification correspond to protein subfamilies. Subsequently we merge the subclasses using the weaker pairs in a two-phase clustering algorithm. The algorithm makes use of transitivity to identify homologous proteins; however, transitivity is applied restrictively in an attempt to prevent unrelated proteins from clustering together. This process is repeated at varying levels of statistical significance. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Different indices of validity were applied to assess the quality of our classification and compare it with the protein families in the PROSITE and Pfam databases. Our classification agrees with these domain-based classifications for between 64.8% and 88.5% of the proteins. It also finds many new clusters of protein sequences which were not classified by these databases. The hierarchical organization suggested by our analysis reveals finer subfamilies in families of known proteins as well as many novel relations between protein families.


Assuntos
Proteínas/classificação , Homologia de Sequência de Aminoácidos , Algoritmos , Bases de Dados Factuais , Evolução Molecular , Proteínas/química
4.
Artigo em Inglês | MEDLINE | ID: mdl-9783227

RESUMO

We investigate the space of all protein sequences. We combine the standard measures of similarity (SW, FASTA, BLAST), to associate with each sequence an exhaustive list of neighboring sequences. These lists induce a (weighted directed) graph whose vertices are the sequences. The weight of an edge connecting two sequences represents their degree of similarity. This graph encodes much of the fundamental properties of the sequence space. We look for clusters of related proteins in this graph. These clusters correspond to strongly connected sets of vertices. Two main ideas underlie our work: i) Interesting homologies among proteins can be deduced by transitivity. ii) Transitivity should be applied restrictively in order to prevent unrelated proteins from clustering together. Our analysis starts from a very conservative classification, based on very significant similarities, that has many classes. Subsequently, classes are merged to include less significant similarities. Merging is performed via a novel two phase algorithm. First, the algorithm identifies groups of possibly related clusters (based on transitivity and strong connectivity) using local considerations, and merges them. Then, a global test is applied to identify nuclei of strong relationships within these groups of clusters, and the classification is refined accordingly. This process takes place at varying thresholds of statistical significance, where at each step the algorithm is applied on the classes of the previous classification, to obtain the next one, at the more permissive threshold. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the space of all protein sequences into well defined groups of proteins. The results show that the automatically induced sets of proteins are closely correlated with natural biological families and super families. The hierarchical organization reveals finer sub-families that make up known families of proteins as well as many interesting relations between protein families. The hierarchical organization proposed may be considered as the first map of the space of all protein sequences. An interactive web site including the results of our analysis has been constructed, and is now accessible through http:/(/)www.protomap.cs.huji.ac.il


Assuntos
Proteínas/classificação , Proteínas/genética , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Bases de Dados Factuais , Proteínas/química , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
5.
J Mol Biol ; 268(2): 539-56, 1997 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-9159489

RESUMO

A global classification of all currently known protein sequences is performed. Every protein sequence is partitioned into segments of 50 amino acid residues and a dynamic programming distance is calculated between each pair of segments. This space of segments is initially embedded into Euclidean space. The algorithm that we apply embeds every finite metric space into Euclidean space so that (1) the dimension of the host space is small, (2) the metric distortion is small. A novel self-organized, cross-validated clustering algorithm is then applied to the embedded space with Euclidean distances. We monitor the validity of our clustering by randomly splitting the data into two parts and performing an hierarchical clustering algorithm independently on each part. At every level of the hierarchy we cross-validate the clusters in one part with the clusters in the other. The resulting hierarchical tree of clusters offers a new representation of protein sequences and families, which compares favorably with the most updated classifications based on functional and structural data about proteins. Some of the known families clustered into well distinct clusters. Motifs and domains such as the zinc finger, EF hand, homeobox, EGF-like and others are automatically correctly identified, and relations between protein families are revealed by examining the splits along the tree. This clustering leads to a novel representation of protein families, from which functional biological kinship of protein families can be deduced, as demonstrated for the transporter family. Finally, we introduce a new concise representation for complete proteins that is very useful in presenting multiple alignments, and in searching for close relatives in the database. The self-organization method presented is very general and applies to any data with a consistent and computable measure of similarity between data items.


Assuntos
Análise por Conglomerados , Proteínas/classificação , Análise de Sequência/métodos , Sequência de Aminoácidos , Animais , Proteínas de Ligação a DNA/classificação , Hemeproteínas/classificação , Proteínas de Homeodomínio/classificação , Humanos , Metaloproteínas/classificação , Dados de Sequência Molecular , Dedos de Zinco
6.
Science ; 268(5210): 481; author reply 483-4, 1995 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-7725085
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...