Pesquisa | Portal Regional da BVS

1.

Cell fingerprinting: an approach to classifying cells according to mass profiles of digests of protein extracts.

Zhou, X; Gonnet, G; Hallett, M; Münchbach, M; Folkers, G; James, P.

Proteomics ; 1(5): 683-90, 2001 May.

Artigo em Inglês | MEDLINE | ID: mdl-11678037

RESUMO

We present a statistical framework for classifying cells according to the set of peptide masses obtained by mass spectrometric analysis of digestions of whole cell protein extracts. The digest is separated by high performance liquid chromatography (HPLC) coupled directly to a mass spectrometer either by an electrospray interface or by collection to a matrix-assisted laser desorption/ionization target plate. Here, the mass to charge ratio, intensity, and HPLC retention time of the peptides are measured. We have used defined bacterial strains to test this approach. For each bacterium, this process is repeated for extracts obtained at different points in the growth curve in order to try and define an invariant set of signals that uniquely identify the bacterium. This paper presents algorithms for the creation of this cell fingerprint database and develops a Bayesian classification scheme for deciding whether or not an unknown bacterium has a match in the database. Our initial testing based on a limited data set of three bacteria indicates that our approach is feasible. Via a jack-knife test, our Bayesian classification scheme correctly identified the bacterium in 67.8% of the cases.

Assuntos

Bactérias/química , Bactérias/classificação , Proteínas de Bactérias/análise , Espectrometria de Massas/métodos , Teorema de Bayes , Klebsiella pneumoniae/química , Klebsiella pneumoniae/classificação , Proteoma , Staphylococcus aureus/química , Staphylococcus aureus/classificação , Stenotrophomonas maltophilia/química , Stenotrophomonas maltophilia/classificação

2.

Using traveling salesman problem algorithms for evolutionary tree construction.

Korostensky, C; Gonnet, G H.

Bioinformatics ; 16(7): 619-27, 2000 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-11038332

RESUMO

MOTIVATION: The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. RESULTS: We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to 2n. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than f2.gif" BORDER="0">, where f3.gif" BORDER="0">is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown.

Assuntos

Algoritmos , Citocromo P-450 CYP1A1/classificação , Citocromo P-450 CYP1A2/classificação , Evolução Molecular , Hemoglobinas Glicadas/classificação , Filogenia , Sequência de Aminoácidos , Animais , Citocromo P-450 CYP1A1/genética , Citocromo P-450 CYP1A2/genética , Hemoglobinas Glicadas/genética , Dados de Sequência Molecular , Análise de Sequência/métodos

3.

An analysis of the helix-to-strand transition between peptides with identical sequence.

Zhou, X; Alber, F; Folkers, G; Gonnet, G H; Chelvanayagam, G.

Proteins ; 41(2): 248-56, 2000 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-10966577

RESUMO

An analysis of peptide segments with identical sequence but that differ significantly in structure was performed over non-redundant databases of protein structures. We focus on those peptides, which fold into an alpha-helix in one protein but a beta-strand in another. While the study shows that many such structurally ambivalent peptides contain amino acids with a strong helical preference collocated with amino acids with a strong strand preference, the results overwhelmingly indicate that the peptide's environment ultimately dictates its structure. Furthermore, the first naturally occurring structurally ambivalent nonapeptide from evolutionary unrelated proteins is described, highlighting the intrinsic plasticity of peptide sequences. We even find seven proteins that show structural ambivalence under different conditions. Finally, a computer algorithm has been implemented to identify regions in a given sequence where secondary structure prediction programs are likely to make serious mispredictions.

Assuntos

Peptídeos/química , Estrutura Secundária de Proteína , Algoritmos , Sequência de Aminoácidos , Bases de Dados Factuais , Modelos Moleculares , Alinhamento de Sequência , Análise de Sequência de Proteína

4.

Evaluation measures of multiple sequence alignments.

Gonnet, G H; Korostensky, C; Benner, S.

J Comput Biol ; 7(1-2): 261-76, 2000.

Artigo em Inglês | MEDLINE | ID: mdl-10890401

RESUMO

Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of the structure, functionality and, ultimately, the evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach, the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection between an MSA and the associated evolutionary tree. The measure can be used as a tool for evaluating different methods for producing MSAs. A brief example of the last application is provided. Because it weights all evolutionary events on a tree identically, but does not require the reconstruction of a tree, the CS algorithm has advantages over the frequently used sum-of-pairs measures for scoring MSAs, which weight some evolutionary events more strongly than others. Compared to other weighted sum-of-pairs measures, it has the advantage that no evolutionary tree must be constructed, because we can find a circular tour without knowing the tree.

Assuntos

Algoritmos , Alinhamento de Sequência/estatística & dados numéricos , Sequência de Aminoácidos , Biometria , Simulação por Computador , Evolução Molecular , Cadeias de Markov , Dados de Sequência Molecular , Filogenia , Proteínas/genética , Homologia de Sequência de Aminoácidos

5.

Darwin v. 2.0: an interpreted computer language for the biosciences.

Gonnet, G H; Hallett, M T; Korostensky, C; Bernardin, L.

Bioinformatics ; 16(2): 101-3, 2000 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-10842729

RESUMO

MOTIVATION: We announce the availability of the second release of Darwin v. 2.0, an interpreted computer language especially tailored to researchers in the biosciences. The system is a general tool applicable to a wide range of problems. RESULTS: This second release improves Darwin version 1.6 in several ways: it now contains (1) a larger set of libraries touching most of the classical problems from computational biology (pairwise alignment, all versus all alignments, tree construction, multiple sequence alignment), (2) an expanded set of general purpose algorithms (search algorithms for discrete problems, matrix decomposition routines, complex/long integer arithmetic operations), (3) an improved language with a cleaner syntax, (4) better on-line help, and (5) a number of fixes to user-reported bugs. AVAILABILITY: Darwin is made available for most operating systems free of char ge from the Computational Biochemistry Research Group (CBRG), reachable at http://chrg.inf.ethz.ch. CONTACT: darwin@inf.ethz.ch

Assuntos

Linguagens de Programação , Algoritmos , Bases de Dados Factuais , Computação Matemática , Peptídeos , Filogenia , Alinhamento de Sequência

6.

An algorithm for the identification of proteins using peptides with ragged N- or C-termini generated by sequential endo- and exopeptidase digestions.

Korostensky, C; Staudenmann, W; Dainese, P; Hoving, S; Gonnet, G; James, P.

Electrophoresis ; 19(11): 1933-40, 1998 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-9740053

RESUMO

We have developed an algorithm (MassDynSearch) for identifying proteins using a combination of peptide masses with small associated sequences (tags). Unlike the approach developed by Matthias Mann, 'Tag searching', in which the sequence tags are generated by gas phase fragmentation of peptides in a mass spectrometer, 'Rag Tag' searching uses peptide tags which are generated enzymatically or chemically. The protein is digested either chemically or with an endopeptidase and the resultant mixture is then subjected to partial exopeptidase degradation. The mixture is analyzed by matrix assisted laser desorption and ionization time of flight mass spectrometry and a list of intact peptide masses is generated, each associated with a set of degradation product masses which serve as unique tags. These 'tagged masses' are used as the input to an algorithm we have written, MassDynSearch, which searches protein and DNA databases for proteins which contain similar tagged motifs. The method is simple, rapid and can be fully automated. The main advantage of this approach is that the specificity of the initial digestion is unimportant since multiple peptides with tags are used to search the database. This is especially useful for proteins like membrane, cytoskeletal, and other proteins where specific endopeptidases are less efficient and lower specificity proteases such as chymotrypsin, pepsin, and elastase must be used.

Assuntos

Algoritmos , Proteínas/análise , Sequência de Aminoácidos , Endopeptidases , Exopeptidases , Dados de Sequência Molecular , Peptídeo Hidrolases , Peptídeos

7.

A combinatorial distance-constraint approach to predicting protein tertiary models from known secondary structure.

Chelvanayagam, G; Knecht, L; Jenny, T; Benner, S A; Gonnet, G H.

Fold Des ; 3(3): 149-60, 1998.

Artigo em Inglês | MEDLINE | ID: mdl-9562545

RESUMO

BACKGROUND: Distance geometry methods allow protein structures to be constructed using a large number of distance constraints, which can be elucidated by experimental techniques such as NMR. New methods for gleaning tertiary structural information from multiple sequence alignments make it possible for distance constraints to be predicted from sequence information alone. The basic distance geometry method can thus be applied using these empirically derived distance constraints. Such an approach, which incorporates a novel combinatoric procedure, is reported here. RESULTS: Given the correct sheet topology and disulfide formations, the fully automated procedure is generally able to construct native-like Calpha models for eight small beta-protein structures. When the sheet topology was unknown but disulfide connectivities were included, all sheet topologies were explored by the combinatorial procedure. Using a simple geometric evaluation scheme, models with the correct sheet topology were ranked first in four of the eight example cases, second in three examples and third in one example. If neither the sheet topology nor the disulfide connectivities were given a priori, all combinations of sheet topologies and disulfides were explored by the combinatorial procedure. The evaluation scheme ranked the correct topology within the top five folds for half the example cases. CONCLUSIONS: The combinatorial procedure is a useful technique for identifying a limited number of low-resolution candidate folds for small, disulfide-rich, beta-protein structures. Better results are obtained, however, if correct disulfide connectivities are known in advance. Combinatorial distance constraints can be applied whenever there are a sufficiently small number of finite connectivities.

Assuntos

Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Dissulfetos , Estudos de Avaliação como Assunto , Previsões , Modelos Moleculares , Dados de Sequência Molecular , Propriedades de Superfície

8.

An analysis of simultaneous variation in protein structures.

Chelvanayagam, G; Eggenschwiler, A; Knecht, L; Gonnet, G H; Benner, S A.

Protein Eng ; 10(4): 307-16, 1997 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-9194155

RESUMO

The simultaneous substitution of pairs of buried amino acid side chains during divergent evolution has been examined in a set of protein families with known crystal structures. A weak signal is found that shows that amino acid pairs near in space in the folded structure preferentially undergo substitution in a compensatory way. Three different physicochemical types of covariation 'signals' were then examined separately, with consideration given to the evolutionary distance at which different types of compensation occur. Where the compensatory covariation tends towards retaining the combined residue volumes, the signal is significant only at very low evolutionary distances. Where the covariation compensates for changes in the hydrogen bonding, the signal is strongest at intermediate evolutionary distances. Covariations that compensate for charge variations appeared with equal strength at all the evolutionary distances examined. A recipe is suggested for using the weak covariation signal to assemble the predicted secondary structural elements, where the evolutionary distance, covariation type and weighting are considered together with the tertiary structural context (interior or surface) of the residues being examined.

Assuntos

Evolução Molecular , Conformação Proteica , Simulação por Computador , Modelos Químicos , Mapeamento de Peptídeos , Alinhamento de Sequência

9.

A predicted consensus structure for the N-terminal fragment of the heat shock protein HSP90 family.

Gerloff, D L; Cohen, F E; Korostensky, C; Turcotte, M; Gonnet, G H; Benner, S A.

Proteins ; 27(3): 450-8, 1997 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-9094746

RESUMO

A secondary structure has been predicted for the heat shock protein HSP90 family from an aligned set of homologous protein sequences by using a transparent method in both manual and automated implementation that extracts conformational information from patterns of variation and conservation within the family. No statistically significant sequence similarity relates this family to any protein with known crystal structure. However, the secondary structure prediction, together with the assignment of active site positions and possible biochemical properties, suggest that the fold is similar to that seen in N-terminal domain of DNA gyrase B (the ATPase fragment).

Assuntos

Algoritmos , Proteínas de Choque Térmico HSP90/química , Modelos Moleculares , Sítios de Ligação , DNA Girase , DNA Topoisomerases Tipo II/química , DNA Topoisomerases Tipo II/metabolismo , Proteínas de Choque Térmico HSP90/metabolismo , Dados de Sequência Molecular , Conformação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína

10.

Probing protein function using a combination of gene knockout and proteome analysis by mass spectrometry.

Dainese, P; Staudenmann, W; Quadroni, M; Korostensky, C; Gonnet, G; Kertesz, M; James, P.

Electrophoresis ; 18(3-4): 432-42, 1997.

Artigo em Inglês | MEDLINE | ID: mdl-9150922

RESUMO

Recently the determination of the genome sequences of three procaryotes (Haemophilus influenzae, Methanococcus jannaschii and Mycoplasma genitalium) as well as the first eucaryotic genome (Saccharomyces cerevisiae) were completed. Between 40-60% of the genes were found to code for proteins to which no function could be assigned. We describe an approach which combines proteome analysis (mapping of expressed proteins isolated by two-dimensional polyacrylamide gel electrophoresis to the genome) with genetic manipulations to study the complex pattern of protein regulation occurring in Escherichia coli in response to sulfate starvation. We have previously described the upregulation of eight spots on two-dimensional (2-D) gels in response to sulfate starvation and the assignment of six of these to entries in the E. coli genome sequence (Quadroni et al., Eur. J. Biochem. 1996, 239, 773-781). Here we describe the identification of the remaining two proteins which are encoded in a sulfate-controlled operon in the 21.5' region of the E. coli genome. Upregulated protein spots were cut from multiple 2-D gels collected and run on a modified funnel gel to concentrate the proteins and remove the sodium dodecyl sulfate before digestion. The peptide masses obtained from the digests were used to search the SwissProt database or a six-frame translation of the EMBL DNA database using a peptide mass fingerprinting algorithm. A digest can be reanalyzed after deuterium exchange to obtain a second, orthogonal data set to increase the confidence level of protein identification. The digests of the remaining unidentified proteins were used for peptide fragment generation using either post-source decay in a matrix-assisted laser desorption ionization (MALDI) time-of-flight mass spectrometer or collision-induced dissociation (CID) coupled mass spectrometry (MS/MS) with triple stage quadrupole or ion trap mass spectrometers. The spectra were used as peptide fragment fingerprints to search the SwissProt and EMBL databases.

Assuntos

Proteínas de Bactérias/análise , Eletroforese em Gel Bidimensional , Escherichia coli/química , Mapeamento de Peptídeos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Escherichia coli/genética , Deleção de Genes , Dados de Sequência Molecular

11.

Amino acid substitution during functionally constrained divergent evolution of protein sequences.

Benner, S A; Cohen, M A; Gonnet, G H.

Protein Eng ; 7(11): 1323-32, 1994 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-7700864

RESUMO

In aligning homologous protein sequences, it is generally assumed that amino acid substitutions subsequent in time occur independently of amino acid substitutions previous in time, i.e. that patterns of mutation are similar at low and high sequence divergence. This assumption is examined here and shown to be incorrect in an interesting way. Separate mutation matrices were constructed for aligned protein sequence pairs at divergences ranging from 5 to 100 PAM units (point accepted mutations per 100 aligned positions). From these, the corresponding log-odds (Dayhoff) matrices, normalized to 250 PAM units, were constructed. The matrices show that the genetic code influences accepted point mutations strongly at early stages of divergence, while the chemical properties of the side chains dominate at more advanced stages.

Assuntos

Sequência de Aminoácidos , Evolução Biológica , Alinhamento de Sequência/métodos , Código Genético , Computação Matemática , Dados de Sequência Molecular , Mutação Puntual , Probabilidade , Homologia de Sequência de Aminoácidos

12.

Protein identification in DNA databases by peptide mass fingerprinting.

James, P; Quadroni, M; Carafoli, E; Gonnet, G.

Protein Sci ; 3(8): 1347-50, 1994 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-7987229

RESUMO

Proteins can be identified using a set of peptide fragment weights produced by a specific digestion to search a protein database in which sequences have been replaced by fragment weights calculated for various cleavage methods. We present a method using multidimensional searches that greatly increases the confidence level for identification, allowing DNA sequence databases to be examined. This method provides a link between 2-dimensional gel electrophoresis protein databases and genome sequencing projects. Moreover, the increased confidence level allows unknown proteins to be matched to expressed sequence tags, potentially eliminating the need to obtain sequence information for cloning. Database searching from a mass profile is offered as a free service by an automatic server at the ETH, Zürich. For information, send an electronic message to the address cbrg/inf.ethz.ch with the line: help mass search, or help all.

Assuntos

DNA/química , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/genética , Animais , Creatina Quinase/química , Creatina Quinase/genética , DNA Complementar/química , Deutério , Endopeptidases/metabolismo , Expressão Gênica , Humanos , Isoenzimas , Espectrometria de Massas , Dados de Sequência Molecular , Peso Molecular , Fragmentos de Peptídeos/metabolismo

13.

Analysis of amino acid substitution during divergent evolution: the 400 by 400 dipeptide substitution matrix.

Gonnet, G H; Cohen, M A; Benner, S A.

Biochem Biophys Res Commun ; 199(2): 489-96, 1994 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-8135790

RESUMO

Most formal methods for analyzing the divergent evolution of protein sequences assume a Markov model where position i in a polypeptide chain undergoes amino acid substitution independently from position i + 1. The large number of aligned homologous sequence pairs available from the exhaustive matching of the protein sequence database makes it possible to examine this assumption empirically. We have constructed a 400 by 400 matrix that reports empirical probabilities for the interconversion of all pairs of dipeptides in proteins undergoing divergent evolution. Comparison of these probabilities with those expected if substitution at adjacent positions in a protein sequence were independent reveals interesting patterns that arise through the breakdown of this assumption. Several of these are useful in extracting conformational information from patterns of conservation and variation in homologous protein sequences.

Assuntos

Evolução Biológica , Dipeptídeos , Variação Genética , Modelos Genéticos , Mutação Puntual , Proteínas/genética , Sequência de Aminoácidos , Sequência Conservada , Sistemas de Informação , Cadeias de Markov

14.

Predicting the conformation of proteins from sequences. Progress and future progress.

Benner, S A; Jenny, T F; Cohen, M A; Gonnet, G H.

Adv Enzyme Regul ; 34: 269-353, 1994.

Artigo em Inglês | MEDLINE | ID: mdl-7942279

RESUMO

A new paradigm for predicting the secondary and tertiary structure of functional proteins from sequence data has emerged from detailed models of how natural selection, conservation, and neutral drift, the three fundamental factors in molecular evolution, leave their mark upon protein sequences. Structural information is extracted from a set of aligned homologous sequences via an analysis of patterns of conservation and variation between proteins with quantitatively defined evolutionary relationships. Tertiary structural information is obtained prior to the assignment of secondary structure, where it plays an important role. Throughout, structural predictions are made with the active involvement of a biochemist whose expertise and insight is critical both for making the prediction and in analyzing its successful and unsuccessful parts. Secondary structure predictions are evaluated based on their ability to sustain an effort to model tertiary structure. Several predictions made using the new paradigm can now be compared with those made under the classical paradigm, including a neural network. The results obtained from the new paradigm are clearly superior to those obtained with the classical paradigm, at least within the protein families that were examined.

Assuntos

Modelos Químicos , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Bioquímica/tendências , Evolução Biológica , Dados de Sequência Molecular , Proteínas/genética , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos

15.

Protein identification by mass profile fingerprinting.

James, P; Quadroni, M; Carafoli, E; Gonnet, G.

Biochem Biophys Res Commun ; 195(1): 58-64, 1993 Aug 31.

Artigo em Inglês | MEDLINE | ID: mdl-8363627

RESUMO

We have developed an algorithm for identifying proteins at the sub-microgram level without sequence determination by chemical degradation. The protein, usually isolated by one- or two-dimensional gel electrophoresis, is digested by enzymatic or chemical means and the masses of the resulting peptides are determined by mass spectrometry. The resulting mass profile, i.e., the list of the molecular masses of peptides produced by the digestion, serves as a fingerprint which uniquely defines a particular protein. This fingerprint may be used to search the database of known sequences to find proteins with a similar profile. If the protein is not yet sequenced the profile can serve as a unique marker. This provides a rapid and sensitive link between genomic sequences and 2D gel electrophoresis mapping of cellular proteins.

Assuntos

Sequência de Aminoácidos , Proteínas/química , Algoritmos , Animais , Calmodulina/química , Enzimas/química , Humanos , Espectrometria de Massas/métodos , Microquímica , Dados de Sequência Molecular , Peso Molecular , Proteínas/isolamento & purificação

16.

The nitrogenase MoFe protein. A secondary structure prediction.

Gerloff, D L; Jenny, T F; Knecht, L J; Gonnet, G H; Benner, S A.

FEBS Lett ; 318(2): 118-24, 1993 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-8440368

RESUMO

Surface residues, interior residues, and parsing residues, together with a secondary structure derived from these, are predicted for the MoFe nitrogenase protein in advance of a crystal structure of the protein, scheduled shortly to appear in Nature. By publishing this prediction, we test our method for predicting the conformation of proteins from patterns in the divergent evolution of homologous protein sequences in a way that places the method 'at risk'.

Assuntos

Nitrogenase/química , Sequência de Aminoácidos , Azotobacter vinelandii/enzimologia , Proteínas de Bactérias/química , Proteínas de Bactérias/ultraestrutura , Cristalografia , Ferro , Metaloproteínas/química , Metaloproteínas/ultraestrutura , Modelos Teóricos , Dados de Sequência Molecular , Molibdênio , Nitrogenase/ultraestrutura , Estrutura Secundária de Proteína , Alinhamento de Sequência

17.

Empirical and structural models for insertions and deletions in the divergent evolution of proteins.

Benner, S A; Cohen, M A; Gonnet, G H.

J Mol Biol ; 229(4): 1065-82, 1993 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-8445636

RESUMO

The exhaustive matching of the protein sequence database makes possible a broadly based study of insertions and deletions (indels) during divergent evolution. In this study, the probability of a gap in an alignment of a pair of homologous protein sequences was found to increase with the evolutionary distance measured in PAM units (number of accepted point mutations per 100 amino acid residues). A relationship between the average number of amino acid residues between indels and evolutionary distance suggests that a unit 30 to 40 amino acid residues in length remains, on average, undisrupted by indels during divergent evolution. Further, the probability of a gap was found to be inversely proportional to gap length raised to the 1.7 power. This empirical law fits closely over the entire range of gap lengths examined. Gap length distribution is largely independent of evolutionary distance. These results rule out the widely used linear gap penalty as a satisfactory formula for scoring gaps when constructing alignments. Further, the observed gap length distribution can be explained by a simple model of selective pressures governing the acceptance of indels during divergent evolution. Finally, this model provides theoretical support for using indels as part of "parsing algorithms", important in the de novo prediction of the folded structure of proteins from the sequence data.

Assuntos

Evolução Biológica , Variação Genética , Proteínas/química , Homologia de Sequência de Aminoácidos , Sequência de Aminoácidos , Aminoácidos/química , Bases de Dados Factuais , Modelos Químicos , Modelos Genéticos , Dados de Sequência Molecular , Probabilidade , Proteínas/genética , Deleção de Sequência

18.

A word in your protein.

Gonnet, G H; Benner, S A.

Nature ; 361(6408): 121, 1993 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-8421517

Assuntos

Sequência de Aminoácidos , Bases de Dados Factuais , Proteínas/genética , Genoma Humano , Humanos , Idioma

19.

Exhaustive matching of the entire protein sequence database.

Gonnet, G H; Cohen, M A; Benner, S A.

Science ; 256(5062): 1443-5, 1992 Jun 05.

Artigo em Inglês | MEDLINE | ID: mdl-1604319

RESUMO

The entire protein sequence database has been exhaustively matched. Definitive mutation matrices and models for scoring gaps were obtained from the matching and used to organize the sequence database as sets of evolutionarily connected components. The methods developed are general and can be used to manage sequence data generated by major genome sequencing projects. The alignments made possible by the exhaustive matching are the starting point for successful de novo prediction of the folded structures of proteins, for reconstructing sequences of ancient proteins and metabolisms in ancient organisms, and for obtaining new perspectives in structural biochemistry.

Assuntos

Sequência de Aminoácidos , Bases de Dados Factuais , Proteínas/genética , Matemática , Homologia de Sequência do Ácido Nucleico

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA