Pesquisa | Portal Regional da BVS

1.

Using traveling salesman problem algorithms for evolutionary tree construction.

Korostensky, C; Gonnet, G H.

Bioinformatics ; 16(7): 619-27, 2000 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-11038332

RESUMO

MOTIVATION: The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. RESULTS: We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to 2n. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than f2.gif" BORDER="0">, where f3.gif" BORDER="0">is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown.

Assuntos

Algoritmos , Citocromo P-450 CYP1A1/classificação , Citocromo P-450 CYP1A2/classificação , Evolução Molecular , Hemoglobinas Glicadas/classificação , Filogenia , Sequência de Aminoácidos , Animais , Citocromo P-450 CYP1A1/genética , Citocromo P-450 CYP1A2/genética , Hemoglobinas Glicadas/genética , Dados de Sequência Molecular , Análise de Sequência/métodos

2.

An analysis of the helix-to-strand transition between peptides with identical sequence.

Zhou, X; Alber, F; Folkers, G; Gonnet, G H; Chelvanayagam, G.

Proteins ; 41(2): 248-56, 2000 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-10966577

RESUMO

An analysis of peptide segments with identical sequence but that differ significantly in structure was performed over non-redundant databases of protein structures. We focus on those peptides, which fold into an alpha-helix in one protein but a beta-strand in another. While the study shows that many such structurally ambivalent peptides contain amino acids with a strong helical preference collocated with amino acids with a strong strand preference, the results overwhelmingly indicate that the peptide's environment ultimately dictates its structure. Furthermore, the first naturally occurring structurally ambivalent nonapeptide from evolutionary unrelated proteins is described, highlighting the intrinsic plasticity of peptide sequences. We even find seven proteins that show structural ambivalence under different conditions. Finally, a computer algorithm has been implemented to identify regions in a given sequence where secondary structure prediction programs are likely to make serious mispredictions.

Assuntos

Peptídeos/química , Estrutura Secundária de Proteína , Algoritmos , Sequência de Aminoácidos , Bases de Dados Factuais , Modelos Moleculares , Alinhamento de Sequência , Análise de Sequência de Proteína

3.

Evaluation measures of multiple sequence alignments.

Gonnet, G H; Korostensky, C; Benner, S.

J Comput Biol ; 7(1-2): 261-76, 2000.

Artigo em Inglês | MEDLINE | ID: mdl-10890401

RESUMO

Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of the structure, functionality and, ultimately, the evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach, the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection between an MSA and the associated evolutionary tree. The measure can be used as a tool for evaluating different methods for producing MSAs. A brief example of the last application is provided. Because it weights all evolutionary events on a tree identically, but does not require the reconstruction of a tree, the CS algorithm has advantages over the frequently used sum-of-pairs measures for scoring MSAs, which weight some evolutionary events more strongly than others. Compared to other weighted sum-of-pairs measures, it has the advantage that no evolutionary tree must be constructed, because we can find a circular tour without knowing the tree.

Assuntos

Algoritmos , Alinhamento de Sequência/estatística & dados numéricos , Sequência de Aminoácidos , Biometria , Simulação por Computador , Evolução Molecular , Cadeias de Markov , Dados de Sequência Molecular , Filogenia , Proteínas/genética , Homologia de Sequência de Aminoácidos

4.

Darwin v. 2.0: an interpreted computer language for the biosciences.

Gonnet, G H; Hallett, M T; Korostensky, C; Bernardin, L.

Bioinformatics ; 16(2): 101-3, 2000 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-10842729

RESUMO

MOTIVATION: We announce the availability of the second release of Darwin v. 2.0, an interpreted computer language especially tailored to researchers in the biosciences. The system is a general tool applicable to a wide range of problems. RESULTS: This second release improves Darwin version 1.6 in several ways: it now contains (1) a larger set of libraries touching most of the classical problems from computational biology (pairwise alignment, all versus all alignments, tree construction, multiple sequence alignment), (2) an expanded set of general purpose algorithms (search algorithms for discrete problems, matrix decomposition routines, complex/long integer arithmetic operations), (3) an improved language with a cleaner syntax, (4) better on-line help, and (5) a number of fixes to user-reported bugs. AVAILABILITY: Darwin is made available for most operating systems free of char ge from the Computational Biochemistry Research Group (CBRG), reachable at http://chrg.inf.ethz.ch. CONTACT: darwin@inf.ethz.ch

Assuntos

Linguagens de Programação , Algoritmos , Bases de Dados Factuais , Computação Matemática , Peptídeos , Filogenia , Alinhamento de Sequência

5.

A combinatorial distance-constraint approach to predicting protein tertiary models from known secondary structure.

Chelvanayagam, G; Knecht, L; Jenny, T; Benner, S A; Gonnet, G H.

Fold Des ; 3(3): 149-60, 1998.

Artigo em Inglês | MEDLINE | ID: mdl-9562545

RESUMO

BACKGROUND: Distance geometry methods allow protein structures to be constructed using a large number of distance constraints, which can be elucidated by experimental techniques such as NMR. New methods for gleaning tertiary structural information from multiple sequence alignments make it possible for distance constraints to be predicted from sequence information alone. The basic distance geometry method can thus be applied using these empirically derived distance constraints. Such an approach, which incorporates a novel combinatoric procedure, is reported here. RESULTS: Given the correct sheet topology and disulfide formations, the fully automated procedure is generally able to construct native-like Calpha models for eight small beta-protein structures. When the sheet topology was unknown but disulfide connectivities were included, all sheet topologies were explored by the combinatorial procedure. Using a simple geometric evaluation scheme, models with the correct sheet topology were ranked first in four of the eight example cases, second in three examples and third in one example. If neither the sheet topology nor the disulfide connectivities were given a priori, all combinations of sheet topologies and disulfides were explored by the combinatorial procedure. The evaluation scheme ranked the correct topology within the top five folds for half the example cases. CONCLUSIONS: The combinatorial procedure is a useful technique for identifying a limited number of low-resolution candidate folds for small, disulfide-rich, beta-protein structures. Better results are obtained, however, if correct disulfide connectivities are known in advance. Combinatorial distance constraints can be applied whenever there are a sufficiently small number of finite connectivities.

Assuntos

Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Dissulfetos , Estudos de Avaliação como Assunto , Previsões , Modelos Moleculares , Dados de Sequência Molecular , Propriedades de Superfície

6.

An analysis of simultaneous variation in protein structures.

Chelvanayagam, G; Eggenschwiler, A; Knecht, L; Gonnet, G H; Benner, S A.

Protein Eng ; 10(4): 307-16, 1997 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-9194155

RESUMO

The simultaneous substitution of pairs of buried amino acid side chains during divergent evolution has been examined in a set of protein families with known crystal structures. A weak signal is found that shows that amino acid pairs near in space in the folded structure preferentially undergo substitution in a compensatory way. Three different physicochemical types of covariation 'signals' were then examined separately, with consideration given to the evolutionary distance at which different types of compensation occur. Where the compensatory covariation tends towards retaining the combined residue volumes, the signal is significant only at very low evolutionary distances. Where the covariation compensates for changes in the hydrogen bonding, the signal is strongest at intermediate evolutionary distances. Covariations that compensate for charge variations appeared with equal strength at all the evolutionary distances examined. A recipe is suggested for using the weak covariation signal to assemble the predicted secondary structural elements, where the evolutionary distance, covariation type and weighting are considered together with the tertiary structural context (interior or surface) of the residues being examined.

Assuntos

Evolução Molecular , Conformação Proteica , Simulação por Computador , Modelos Químicos , Mapeamento de Peptídeos , Alinhamento de Sequência

7.

A predicted consensus structure for the N-terminal fragment of the heat shock protein HSP90 family.

Gerloff, D L; Cohen, F E; Korostensky, C; Turcotte, M; Gonnet, G H; Benner, S A.

Proteins ; 27(3): 450-8, 1997 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-9094746

RESUMO

A secondary structure has been predicted for the heat shock protein HSP90 family from an aligned set of homologous protein sequences by using a transparent method in both manual and automated implementation that extracts conformational information from patterns of variation and conservation within the family. No statistically significant sequence similarity relates this family to any protein with known crystal structure. However, the secondary structure prediction, together with the assignment of active site positions and possible biochemical properties, suggest that the fold is similar to that seen in N-terminal domain of DNA gyrase B (the ATPase fragment).

Assuntos

Algoritmos , Proteínas de Choque Térmico HSP90/química , Modelos Moleculares , Sítios de Ligação , DNA Girase , DNA Topoisomerases Tipo II/química , DNA Topoisomerases Tipo II/metabolismo , Proteínas de Choque Térmico HSP90/metabolismo , Dados de Sequência Molecular , Conformação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína

8.

Amino acid substitution during functionally constrained divergent evolution of protein sequences.

Benner, S A; Cohen, M A; Gonnet, G H.

Protein Eng ; 7(11): 1323-32, 1994 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-7700864

RESUMO

In aligning homologous protein sequences, it is generally assumed that amino acid substitutions subsequent in time occur independently of amino acid substitutions previous in time, i.e. that patterns of mutation are similar at low and high sequence divergence. This assumption is examined here and shown to be incorrect in an interesting way. Separate mutation matrices were constructed for aligned protein sequence pairs at divergences ranging from 5 to 100 PAM units (point accepted mutations per 100 aligned positions). From these, the corresponding log-odds (Dayhoff) matrices, normalized to 250 PAM units, were constructed. The matrices show that the genetic code influences accepted point mutations strongly at early stages of divergence, while the chemical properties of the side chains dominate at more advanced stages.

Assuntos

Sequência de Aminoácidos , Evolução Biológica , Alinhamento de Sequência/métodos , Código Genético , Computação Matemática , Dados de Sequência Molecular , Mutação Puntual , Probabilidade , Homologia de Sequência de Aminoácidos

9.

Analysis of amino acid substitution during divergent evolution: the 400 by 400 dipeptide substitution matrix.

Gonnet, G H; Cohen, M A; Benner, S A.

Biochem Biophys Res Commun ; 199(2): 489-96, 1994 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-8135790

RESUMO

Most formal methods for analyzing the divergent evolution of protein sequences assume a Markov model where position i in a polypeptide chain undergoes amino acid substitution independently from position i + 1. The large number of aligned homologous sequence pairs available from the exhaustive matching of the protein sequence database makes it possible to examine this assumption empirically. We have constructed a 400 by 400 matrix that reports empirical probabilities for the interconversion of all pairs of dipeptides in proteins undergoing divergent evolution. Comparison of these probabilities with those expected if substitution at adjacent positions in a protein sequence were independent reveals interesting patterns that arise through the breakdown of this assumption. Several of these are useful in extracting conformational information from patterns of conservation and variation in homologous protein sequences.

Assuntos

Evolução Biológica , Dipeptídeos , Variação Genética , Modelos Genéticos , Mutação Puntual , Proteínas/genética , Sequência de Aminoácidos , Sequência Conservada , Sistemas de Informação , Cadeias de Markov

10.

Predicting the conformation of proteins from sequences. Progress and future progress.

Benner, S A; Jenny, T F; Cohen, M A; Gonnet, G H.

Adv Enzyme Regul ; 34: 269-353, 1994.

Artigo em Inglês | MEDLINE | ID: mdl-7942279

RESUMO

A new paradigm for predicting the secondary and tertiary structure of functional proteins from sequence data has emerged from detailed models of how natural selection, conservation, and neutral drift, the three fundamental factors in molecular evolution, leave their mark upon protein sequences. Structural information is extracted from a set of aligned homologous sequences via an analysis of patterns of conservation and variation between proteins with quantitatively defined evolutionary relationships. Tertiary structural information is obtained prior to the assignment of secondary structure, where it plays an important role. Throughout, structural predictions are made with the active involvement of a biochemist whose expertise and insight is critical both for making the prediction and in analyzing its successful and unsuccessful parts. Secondary structure predictions are evaluated based on their ability to sustain an effort to model tertiary structure. Several predictions made using the new paradigm can now be compared with those made under the classical paradigm, including a neural network. The results obtained from the new paradigm are clearly superior to those obtained with the classical paradigm, at least within the protein families that were examined.

Assuntos

Modelos Químicos , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Bioquímica/tendências , Evolução Biológica , Dados de Sequência Molecular , Proteínas/genética , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos

11.

The nitrogenase MoFe protein. A secondary structure prediction.

Gerloff, D L; Jenny, T F; Knecht, L J; Gonnet, G H; Benner, S A.

FEBS Lett ; 318(2): 118-24, 1993 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-8440368

RESUMO

Surface residues, interior residues, and parsing residues, together with a secondary structure derived from these, are predicted for the MoFe nitrogenase protein in advance of a crystal structure of the protein, scheduled shortly to appear in Nature. By publishing this prediction, we test our method for predicting the conformation of proteins from patterns in the divergent evolution of homologous protein sequences in a way that places the method 'at risk'.

Assuntos

Nitrogenase/química , Sequência de Aminoácidos , Azotobacter vinelandii/enzimologia , Proteínas de Bactérias/química , Proteínas de Bactérias/ultraestrutura , Cristalografia , Ferro , Metaloproteínas/química , Metaloproteínas/ultraestrutura , Modelos Teóricos , Dados de Sequência Molecular , Molibdênio , Nitrogenase/ultraestrutura , Estrutura Secundária de Proteína , Alinhamento de Sequência

12.

Empirical and structural models for insertions and deletions in the divergent evolution of proteins.

Benner, S A; Cohen, M A; Gonnet, G H.

J Mol Biol ; 229(4): 1065-82, 1993 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-8445636

RESUMO

The exhaustive matching of the protein sequence database makes possible a broadly based study of insertions and deletions (indels) during divergent evolution. In this study, the probability of a gap in an alignment of a pair of homologous protein sequences was found to increase with the evolutionary distance measured in PAM units (number of accepted point mutations per 100 amino acid residues). A relationship between the average number of amino acid residues between indels and evolutionary distance suggests that a unit 30 to 40 amino acid residues in length remains, on average, undisrupted by indels during divergent evolution. Further, the probability of a gap was found to be inversely proportional to gap length raised to the 1.7 power. This empirical law fits closely over the entire range of gap lengths examined. Gap length distribution is largely independent of evolutionary distance. These results rule out the widely used linear gap penalty as a satisfactory formula for scoring gaps when constructing alignments. Further, the observed gap length distribution can be explained by a simple model of selective pressures governing the acceptance of indels during divergent evolution. Finally, this model provides theoretical support for using indels as part of "parsing algorithms", important in the de novo prediction of the folded structure of proteins from the sequence data.

Assuntos

Evolução Biológica , Variação Genética , Proteínas/química , Homologia de Sequência de Aminoácidos , Sequência de Aminoácidos , Aminoácidos/química , Bases de Dados Factuais , Modelos Químicos , Modelos Genéticos , Dados de Sequência Molecular , Probabilidade , Proteínas/genética , Deleção de Sequência

13.

A word in your protein.

Gonnet, G H; Benner, S A.

Nature ; 361(6408): 121, 1993 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-8421517

Assuntos

Sequência de Aminoácidos , Bases de Dados Factuais , Proteínas/genética , Genoma Humano , Humanos , Idioma

14.

Exhaustive matching of the entire protein sequence database.

Gonnet, G H; Cohen, M A; Benner, S A.

Science ; 256(5062): 1443-5, 1992 Jun 05.

Artigo em Inglês | MEDLINE | ID: mdl-1604319

RESUMO

The entire protein sequence database has been exhaustively matched. Definitive mutation matrices and models for scoring gaps were obtained from the matching and used to organize the sequence database as sets of evolutionarily connected components. The methods developed are general and can be used to manage sequence data generated by major genome sequencing projects. The alignments made possible by the exhaustive matching are the starting point for successful de novo prediction of the folded structures of proteins, for reconstructing sequences of ancient proteins and metabolisms in ancient organisms, and for obtaining new perspectives in structural biochemistry.

Assuntos

Sequência de Aminoácidos , Bases de Dados Factuais , Proteínas/genética , Matemática , Homologia de Sequência do Ácido Nucleico

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA