Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 13(4): e0196135, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29698417

RESUMO

The Glycoside Hydrolase Family 13 (GH13) is both evolutionarily diverse and relevant to many industrial applications. Its members hydrolyze starch into smaller carbohydrates and members of the family have been bioengineered to improve catalytic function under industrial environments. We introduce a framework to analyze the response to selection of GH13 protein structures given some phylogenetic and simulated dynamic information. We find that the TIM-barrel (a conserved protein fold consisting of eight α-helices and eight parallel ß-strands that alternate along the peptide backbone, common to all amylases) is not selectable since it is under purifying selection. We also show a method to rank important residues with higher inferred response to selection. These residues can be altered to effect change in properties. In this work, we define fitness as inferred thermodynamic stability. We show that under the developed framework, residues 112Y, 122K, 124D, 125W, and 126P are good candidates to increase the stability of the truncated α-amylase protein from Geobacillus thermoleovorans (PDB code: 4E2O; α-1,4-glucan-4-glucanohydrolase; EC 3.2.1.1). Overall, this paper demonstrates the feasibility of a framework for the analysis of protein structures for any other fitness landscape.


Assuntos
Glicosídeo Hidrolases/química , Bases de Dados de Proteínas , Geobacillus/enzimologia , Glicosídeo Hidrolases/classificação , Glicosídeo Hidrolases/metabolismo , Simulação de Dinâmica Molecular , Filogenia , Conformação Proteica , Termodinâmica , alfa-Amilases/química , alfa-Amilases/metabolismo
2.
RNA Biol ; 13(4): 391-9, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26488198

RESUMO

The 5S rDNA gene is a non-coding RNA that can be found in 2 copies (type I and type II) in bony and cartilaginous fish. Previous studies have pointed out that type II gene is a paralog derived from type I. We analyzed the molecular organization of 5S rDNA type II in elasmobranchs. Although the structure of the 5S rDNA is supposed to be highly conserved, our results show that the secondary structure in this group possesses some variability and is different than the consensus secondary structure. One of these differences in Selachii is an internal loop at nucleotides 7 and 112. These mutations observed in the transcribed region suggest an independent origin of the gene among Batoids and Selachii. All promoters were highly conserved with the exception of BoxA, possibly due to its affinity to polymerase III. This latter enzyme recognizes a dT4 sequence as stop signal, however in Rajiformes this signal was doubled in length to dT8. This could be an adaptation toward a higher efficiency in the termination process. Our results suggest that there is no TATA box in elasmobranchs in the NTS region. We also provide some evidence suggesting that the complexity of the microsatellites present in the NTS region play an important role in the 5S rRNA gene since it is significantly correlated with the length of the NTS.


Assuntos
Elasmobrânquios/genética , RNA Ribossômico 5S/genética , Animais , Mutação , Conformação de Ácido Nucleico , RNA Ribossômico 5S/química , Especificidade da Espécie , Regiões Terminadoras Genéticas , Transcrição Gênica
3.
IEEE J Biomed Health Inform ; 20(1): 424-31, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25494516

RESUMO

Non-small cell lung cancer (NSCLC) constitutes the most common type of lung cancer and is frequently diagnosed at advanced stages. Clinical studies have shown that molecular targeted therapies increase survival and improve quality of life in patients. Nevertheless, the realization of personalized therapies for NSCLC faces a number of challenges including the integration of clinical and genetic data and a lack of clinical decision support tools to assist physicians with patient selection. To address this problem, we used frequent pattern mining to establish the relationships of patient characteristics and tumor response in advanced NSCLC. Univariate analysis determined that smoking status, histology, epidermal growth factor receptor (EGFR) mutation, and targeted drug were significantly associated with response to targeted therapy. We applied four classifiers to predict treatment outcome from EGFR tyrosine kinase inhibitors. Overall, the highest classification accuracy was 76.56% and the area under the curve was 0.76. The decision tree used a combination of EGFR mutations, histology, and smoking status to predict tumor response and the output was both easily understandable and in keeping with current knowledge. Our findings suggest that support vector machines and decision trees are a promising approach for clinical decision support in the patient selection for targeted therapy in advanced NSCLC.


Assuntos
Antineoplásicos/uso terapêutico , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Árvores de Decisões , Neoplasias Pulmonares/tratamento farmacológico , Modelos Biológicos , Medicina de Precisão , Idoso , Carcinoma Pulmonar de Células não Pequenas/classificação , Carcinoma Pulmonar de Células não Pequenas/genética , Mineração de Dados , Bases de Dados Factuais , Receptores ErbB/genética , Feminino , Humanos , Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/genética , Masculino , Pessoa de Meia-Idade , Mutação/genética , Reconhecimento Automatizado de Padrão , Máquina de Vetores de Suporte
4.
Curr Protein Pept Sci ; 17(1): 62-71, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26412786

RESUMO

Protein structures can be conceptualized as context-aware self-organizing systems. One of its emerging properties is a modular architecture. Such modular architecture has been identified as domains and defined as its units of evolution and function. However, this modular architecture is not exclusively defined by domains. Also, the definition of a domain is an ongoing debate. Here we propose differentiating structural, evolutionary and functional domains as distinct concepts. Defining domains or modules is confounded by diverse definitions of the concept, and also by other elements inherent to protein structures. An apparent hierarchy in protein structure architecture is one of these elements, where lower level interactions may create noise for the definition of higher levels. Diverse modularity-molding factors such as folding, function, and selection, can have a misleading effect when trying to define a given type of module. It is thus important to keep in mind this complexity when defining modularity in protein structures and interpreting the outcome modularity inference approaches.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas/química , Semântica , Animais , Sítios de Ligação , Evolução Biológica , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Relação Estrutura-Atividade
5.
PLoS One ; 9(11): e113438, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25409022

RESUMO

Community structure detection is an important tool in graph analysis. This can be done, among other ways, by solving for the partition set which optimizes the modularity scores [Formula: see text]. Here it is shown that topological constraints in correlation graphs induce over-fragmentation of community structures. A refinement step to this optimization based on Linear Discriminant Analysis (LDA) and a statistical test for significance is proposed. In structured simulation constrained by topology, this novel approach performs better than the optimization of modularity alone. This method was also tested with two empirical datasets: the Roll-Call voting in the 110th US Senate constrained by geographic adjacency, and a biological dataset of 135 protein structures constrained by inter-residue contacts. The former dataset showed sub-structures in the communities that revealed a regional bias in the votes which transcend party affiliations. This is an interesting pattern given that the 110th Legislature was assumed to be a highly polarized government. The [Formula: see text]-amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts). The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter. This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation.


Assuntos
Algoritmos , Domínio Catalítico , Bases de Dados Factuais , Análise Discriminante , alfa-Amilases/química , alfa-Amilases/metabolismo
6.
BMC Struct Biol ; 13: 20, 2013 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-24131821

RESUMO

BACKGROUND: Assessing protein modularity is important to understand protein evolution. Still the question of the existence of a sub-domain modular architecture remains. We propose a graph-theory approach with significance and power testing to identify modules in protein structures. In the first step, clusters are determined by optimizing the partition that maximizes the modularity score. Second, each cluster is tested for significance. Significant clusters are referred to as modules. Evolutionary modules are identified by analyzing homologous structures. Dynamic modules are inferred from sets of snapshots of molecular simulations. We present here a methodology to identify sub-domain architecture robustly, biologically meaningful, and statistically supported. RESULTS: The robustness of this new method is tested using simulated data with known modularity. Modules are correctly identified even when there is a low correlation between landmarks within a module. We also analyzed the evolutionary modularity of a data set of α-amylase catalytic domain homologs, and the dynamic modularity of the Niemann-Pick C1 (NPC1) protein N-terminal domain.The α-amylase contains an (α/ß)8 barrel (TIM barrel) with the polysaccharides cleavage site and a calcium-binding domain. In this data set we identified four robust evolutionary modules, one of which forms the minimal functional TIM barrel topology.The NPC1 protein is involved in the intracellular lipid metabolism coordinating sterol trafficking. NPC1 N-terminus is the first luminal domain which binds to cholesterol and its oxygenated derivatives. Our inferred dynamic modules in the protein NPC1 are also shown to match functional components of the protein related to the NPC1 disease. CONCLUSIONS: A domain compartmentalization can be found and described in correlation space. To our knowledge, there is no other method attempting to identify sub-domain architecture from the correlation among residues. Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy. We were able to describe functional/structural sub-domain architecture related to key residues for starch cleavage, calcium, and chloride binding sites in the α-amylase, and sterol opening-defining modules and disease-related residues in the NPC1. We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.


Assuntos
Estrutura Terciária de Proteína , Proteínas/química , Sequência de Aminoácidos , Animais , Sítios de Ligação , Proteínas de Transporte/química , Proteínas de Transporte/metabolismo , Domínio Catalítico , Colesterol/metabolismo , Evolução Molecular , Humanos , Modelos Químicos , Modelos Moleculares , Simulação de Dinâmica Molecular , Ligação Proteica , Proteínas/metabolismo , Homologia de Sequência de Aminoácidos , alfa-Amilases/química , alfa-Amilases/metabolismo
7.
J Chem Inf Model ; 50(12): 2213-20, 2010 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-21090591

RESUMO

Although the α-helical secondary structure of proteins is well-defined, the exact causes and structures of helical kinks are not. This is especially important for transmembrane (TM) helices of integral membrane proteins, many of which contain kinks providing functional diversity despite predominantly helical structure. We have developed a Monte Carlo method based algorithm, MC-HELAN, to determine helical axes alongside positions and angles of helical kinks. Analysis of all nonredundant high-resolution α-helical membrane protein structures (842 TM helices from 205 polypeptide chains) revealed kinks in 64% of TM helices, demonstrating that a significantly greater proportion of TM helices are kinked than those indicated by previous analyses. The residue proline is over-represented by a factor >5 if it is two or three residues C-terminal to a bend. Prolines also cause kinks with larger kink angles than other residues. However, only 33% of TM kinks are in proximity to a proline. Machine learning techniques were used to test for sequence-based predictors of kinks. Although kinks are somewhat predicted by sequence, kink formation appears to be driven predominantly by other factors. This study provides an improved view of the prevalence and architecture of kinks in helical membrane proteins and highlights the fundamental inaccuracy of the typical topological depiction of helical membrane proteins as series of ideal helices.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas de Membrana/química , Bases de Dados de Proteínas , Internet , Modelos Moleculares , Estrutura Secundária de Proteína
8.
J Biol Chem ; 285(12): 8605-14, 2010 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-20083605

RESUMO

Bacterial acyl carrier protein (ACP) is essential for the synthesis of fatty acids and serves as the major acyl donor for the formation of phospholipids and other lipid products. Acyl-ACP encloses attached fatty acyl groups in a hydrophobic pocket within a four-helix bundle, but must at least partially unfold to present the acyl chain to the active sites of its multiple enzyme partners. To further examine the constraints of ACP structure and function, we have constructed a cyclic version of Vibrio harveyi ACP, using split-intein technology to covalently join its closely apposed N and C termini. Cyclization stabilized ACP in a folded helical conformation as indicated by gel electrophoresis, circular dichroism, fluorescence, and mass spectrometry. Molecular dynamics simulations also indicated overall decreased polypeptide chain mobility in cyclic ACP, although no major conformational rearrangements over a 10-ns period were noted. In vivo complementation assays revealed that cyclic ACP can functionally replace the linear wild-type protein and support growth of an Escherichia coli ACP-null mutant strain. Cyclization of a folding-deficient ACP mutant (F50A) both restored its ability to adopt a folded conformation and enhanced complementation of growth. Our results thus suggest that ACP must be able to adopt a folded conformation for biological activity, and that its function does not require complete unfolding of the protein.


Assuntos
Proteína de Transporte de Acila/química , Inteínas , Dicroísmo Circular , Escherichia coli/metabolismo , Teste de Complementação Genética , Modelos Moleculares , Conformação Molecular , Mutação , Fosfolipídeos/química , Conformação Proteica , Desnaturação Proteica , Dobramento de Proteína , Estrutura Secundária de Proteína , Espectrometria de Massas em Tandem/métodos , Vibrio/metabolismo
9.
Bioinformatics ; 25(23): 3093-8, 2009 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-19770262

RESUMO

MOTIVATION: Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually remove them. Although considered necessary in some cases, manual editing is time consuming and not reproducible. We present here an automated editing method based on the classification of 'valid' and 'invalid' sites. RESULTS: A support vector machine (SVM) classifier is trained to reproduce the decisions made during manual editing with an accuracy of 95.0%. This implies that manual editing can be made reproducible and applied to large-scale analyses. We further demonstrate that it is possible to retrain/extend the training of the classifier by providing examples of multiple sequence alignment (MSA) annotation. Near optimal training can be achieved with only 1000 annotated sites, or roughly three samples of protein sequence alignments. AVAILABILITY: This method is implemented in the software MANUEL, licensed under the GPL. A web-based application for single and batch job is available at http://fester.cs.dal.ca/manuel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Proteínas/química , Software
10.
Genome Res ; 19(10): 1896-904, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19635847

RESUMO

The increasing availability of genetic sequence data associated with explicit geographic and ecological information is offering new opportunities to study the processes that shape biodiversity. The generation and testing of hypotheses using these data sets requires effective tools for mathematical and visual analysis that can integrate digital maps, ecological data, and large genetic, genomic, or metagenomic data sets. GenGIS is a free and open-source software package that supports the integration of digital map data with genetic sequences and environmental information from multiple sample sites. Essential bioinformatic and statistical tools are integrated into the software, allowing the user a wide range of analysis options for their sequence data. Data visualizations are combined with the cartographic display to yield a clear view of the relationship between geography and genomic diversity, with a particular focus on the hierarchical clustering of sites based on their similarity or phylogenetic proximity. Here we outline the features of GenGIS and demonstrate its application to georeferenced microbial metagenomic, HIV-1, and human mitochondrial DNA data sets.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Sistemas de Informação Geográfica , Software , África , Biodiversidade , Classificação , DNA Mitocondrial/análise , DNA Mitocondrial/genética , Variação Genética , HIV-1/classificação , HIV-1/genética , HIV-1/metabolismo , Humanos , Oceanos e Mares , Filogenia , Manejo de Espécimes/métodos
11.
Evol Bioinform Online ; 4: 17-27, 2008 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-19204804

RESUMO

The subtree prune and regraft distance (d(SPR)) between phylogenetic trees is important both as a general means of comparing phylogenetic tree topologies as well as a measure of lateral gene transfer (LGT). Although there has been extensive study on the computation of d(SPR) and similar metrics between rooted trees, much less is known about SPR distances for unrooted trees, which often arise in practice when the root is unresolved. We show that unrooted SPR distance computation is NP-Hard and verify which techniques from related work can and cannot be applied. We then present an efficient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets. Our algorithm computes the exact SPR distance between unrooted tree, and the heuristic element is only with respect to the algorithm's computation time. Our method is a heuristic version of a fixed parameter tractability (FPT) approach and our experiments indicate that the running time behaves similar to FPT algorithms. For real data sets, our algorithm was able to quickly compute d(SPR) for the majority of trees that were part of a study of LGT in 144 prokaryotic genomes. Our analysis of its performance, especially with respect to searching and reduction rules, is applicable to computing many related distance measures.

12.
BMC Bioinformatics ; 8: 444, 2007 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-18005425

RESUMO

BACKGROUND: In protein evolution, the mechanism of the emergence of novel protein domain is still an open question. The incremental growth of protein variable regions, which was produced by stochastic insertions, has the potential to generate large and complex sub-structures. In this study, a deterministic methodology is proposed to reconstruct phylogenies from protein structures, and to infer insertion events in protein evolution. The analysis was performed on a broad range of SCOP domain families. RESULTS: Phylogenies were reconstructed from protein 3D structural data. The phylogenetic trees were used to infer ancestral structures with a consensus method. From these ancestral reconstructions, 42.7% of the observed insertions are nested insertions, which locate in previous insert regions. The average size of inserts tends to increase with the insert rank or total number of insertions in the variable regions. We found that the structures of some nested inserts show complex or even domain-like fold patterns with helices, strands and loops. Furthermore, a basal level of structural innovation was found in inserts which displayed a significant structural similarity exclusively to themselves. The beta-Lactamase/D-ala carboxypeptidase domain family is provided as an example to illustrate the inference of insertion events, and how the incremental growth of a variable region is capable to generate novel structural patterns. CONCLUSION: Using 3D data, we proposed a method to reconstruct phylogenies. We applied the method to reconstruct the sequences of insertion events leading to the emergence of potentially novel structural elements within existing protein domains. The results suggest that structural innovation is possible via the stochastic process of insertions and rapid evolution within variable regions where inserts tend to be nested. We also demonstrate that the structure-based phylogeny enables the study of new questions relating to the evolution of protein domain and biological function.


Assuntos
Elementos de DNA Transponíveis/genética , Evolução Molecular , Modelos Químicos , Modelos Genéticos , Modelos Moleculares , Proteínas , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Dados de Sequência Molecular , Filogenia , Conformação Proteica , Proteínas/química , Proteínas/genética , Proteínas/ultraestrutura , Relação Estrutura-Atividade
13.
J Mol Evol ; 64(1): 80-9, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17160642

RESUMO

We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods.


Assuntos
Algoritmos , Modelos Biológicos , Filogenia , Ascomicetos/genética , Funções Verossimilhança , Proteínas Mitocondriais/genética , Alinhamento de Sequência/métodos , Tubulina (Proteína)
14.
J Mol Model ; 12(2): 221-8, 2006 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-16247602

RESUMO

In this study, a new ab initio method named CLOOP has been developed to build all-atom loop conformations. In this method, a loop main-chain conformation is generated by sampling main-chain dihedral angles from a restrained varphi/psi set, and the side-chain conformations are built randomly. The CHARMM all-atom force field was used to evaluate the loop conformations. Soft core potentials were used to treat the non-bond interactions, and a designed energy-minimization technique was used to close and optimize the loop conformations. It is shown that the two strategies improve the computational efficiency and the loop-closure rate substantially compared to normal minimization methods. CLOOP was used to construct the conformations of 4-, 8-, and 12-residue loops in Fiser's test set. The average main-chain root-mean-square deviations obtained in 1,000 trials for the 10 different loops of each size are 0.33, 1.27, and 2.77 A, respectively. CLOOP can build all-atom loop conformations with a sampling accuracy comparable with previous loop main-chain construction algorithms. [Figure: see text].


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Algoritmos , Aminoácidos/química , Bioquímica/métodos , Conformação Molecular , Estrutura Molecular
15.
Biochemistry ; 44(25): 9013-21, 2005 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-15966725

RESUMO

Mandelate racemase (MR, EC 5.1.2.2) from Pseudomonas putida catalyzes the Mg(2+)-dependent 1,1-proton transfer that interconverts the enantiomers of mandelate. Crystal structures of MR reveal that the phenyl group of all ground-state ligands is located within a hydrophobic cavity, remote from the site of proton abstraction. MR forms numerous electrostatic and H-bonding interactions with the alpha-OH and carboxyl groups of the substrate, suggesting that these polar groups may remain relatively fixed in position during catalysis while the phenyl group is free to move between two binding sites [i.e., the R-pocket and the S-pocket for binding the phenyl group of (R)-mandelate and (S)-mandelate, respectively]. We show that MR binds benzilate (K(i) = 0.67 +/- 0.12 mM) and (S)-cyclohexylphenylglycolate (K(i) = 0.50 +/- 0.03 mM) as competitive inhibitors with affinities similar to that which the enzyme exhibits for the substrate. Therefore, the active site can simultaneously accommodate two phenyl groups, consistent with the existence of an R-pocket and an S-pocket. Wild-type MR exhibits a slightly higher affinity for (S)-mandelate [i.e., K(m)(S)(-)(man) < K(m)(R)(-)(man)] but catalyzes the turnover of (R)-mandelate slightly more rapidly (i.e., k(cat)(R)(-->)(S) > k(cat)(S)(-->)(R)). Upon introduction of steric bulk into the S-pocket using site-directed mutagenesis (i.e., the F52W, Y54W, and F52W/Y54W mutants), this catalytic preference is reversed. Although the catalytic efficiency (k(cat)/K(m)) of all the mutants was reduced (11-280-fold), all mutants exhibited a higher affinity for (R)-mandelate than for (S)-mandelate, and higher turnover numbers with (S)-mandelate as the substrate, relative to those with (R)-mandelate. (R)- and (S)-2-hydroxybutyrate are expected to be less sensitive to the additional steric bulk in the S-pocket. Unlike those for mandelate, the relative binding affinities for these substrate analogues are not reversed. These results are consistent with steric obstruction in the S-pocket and support the hypothesis that the phenyl group of the substrate may move between an R-pocket and an S-pocket during racemization. These conclusions were also supported by modeling of the binary complexes of the wild-type and F52W/Y54W enzymes with the substrate analogues (R)- and (S)-atrolactate, and of wild-type MR with bound benzilate using molecular dynamics simulations.


Assuntos
Movimento , Fenol/química , Fenol/metabolismo , Racemases e Epimerases/química , Racemases e Epimerases/metabolismo , Sítios de Ligação , Catálise , Simulação por Computador , Interações Hidrofóbicas e Hidrofílicas , Hidroxibutiratos/química , Hidroxibutiratos/farmacologia , Isomerismo , Cinética , Ácidos Mandélicos/química , Ácidos Mandélicos/metabolismo , Modelos Moleculares , Mutação/genética , Fenilalanina/genética , Fenilalanina/metabolismo , Estrutura Terciária de Proteína , Pseudomonas putida/enzimologia , Pseudomonas putida/genética , Racemases e Epimerases/genética
16.
BMC Bioinformatics ; 6: 138, 2005 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-15938750

RESUMO

BACKGROUND: An increasing number of bioinformatics methods are considering the phylogenetic relationships between biological sequences. Implementing new methodologies using the maximum likelihood phylogenetic framework can be a time consuming task. RESULTS: The bioinformatics library libcov is a collection of C++ classes that provides a high and low-level interface to maximum likelihood phylogenetics, sequence analysis and a data structure for structural biological methods. libcov can be used to compute likelihoods, search tree topologies, estimate site rates, cluster sequences, manipulate tree structures and compare phylogenies for a broad selection of applications. CONCLUSION: Using this library, it is possible to rapidly prototype applications that use the sophistication of phylogenetic likelihoods without getting involved in a major software engineering project. libcov is thus a potentially valuable building block to develop in-house methodologies in the field of protein phylogenetics.


Assuntos
Biologia Computacional/instrumentação , Biologia Computacional/métodos , Linguagens de Programação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Design de Software , Algoritmos , Análise por Conglomerados , Computadores , Interpretação Estatística de Dados , Bases de Dados de Proteínas , Evolução Molecular , Biblioteca Gênica , Funções Verossimilhança , Filogenia , Análise de Sequência de DNA , Software
17.
Protein Sci ; 13(3): 608-16, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-14978301

RESUMO

The rapidly evolving subsets of a protein are often evident in multiple sequence alignments as poorly defined, gap-containing regions. We investigated the 3D context of these regions observed in 28 protein structures containing a GTP-binding domain assumed to be homologous to the transforming factor p21-RAS. The phylogenetic depth of this data set is such that it is possible to observe lineages sharing a common protein core that diverged early in the eukaryotic cell history. The sequence variability among these homolog proteins is directly linked to the structural variability of surface loops. We demonstrate that these regions are self-contained and thus mostly free of the evolutionary constraints imposed by the conserved core of the domain. These intraloop interactions have the property to create stem-like structures. Interestingly, these stem-like structures can be observed in loops of varying size, up to the size of small protein domains. We propose a model under which the diversity of protein topologies observed in these loops can be the product of a stochastic sampling of sequence and conformational space in a near-neutral fashion, while the proximity of the functional features of the domain core allows novel beneficial traits to be fixed. Our comparative observations, limited here to the proteins containing the RAS-like GTP-binding domain, suggest that a stochastic process of insertion/deletion analogous to "budding" of loops is a likely mechanism of structural innovation. Such a framework could be experimentally exploited to investigate the folding of increasingly complex model inserts.


Assuntos
Evolução Molecular , Proteínas de Ligação ao GTP/química , Sequência de Aminoácidos , Animais , Sítios de Ligação/genética , Fator de Iniciação 2 em Eucariotos/química , Fator de Iniciação 2 em Eucariotos/genética , Subunidades alfa Gs de Proteínas de Ligação ao GTP/química , Subunidades alfa Gs de Proteínas de Ligação ao GTP/genética , Proteínas de Ligação ao GTP/genética , Deleção de Genes , Humanos , Modelos Genéticos , Modelos Moleculares , Dados de Sequência Molecular , Mutagênese Insercional , Filogenia , Conformação Proteica , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/genética , Proteínas Proto-Oncogênicas p21(ras)/química , Proteínas Proto-Oncogênicas p21(ras)/genética , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Alinhamento de Sequência , Processos Estocásticos , Homologia Estrutural de Proteína , Proteínas rab de Ligação ao GTP/química , Proteínas rab de Ligação ao GTP/genética
18.
Syst Biol ; 52(5): 594-603, 2003 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-14530128

RESUMO

Previous work has shown that it is often essential to account for the variation in rates at different sites in phylogenetic models in order to avoid phylogenetic artifacts such as long branch attraction. In most current models, the gamma distribution is used for the rates-across-sites distributions and is implemented as an equal-probability discrete gamma. In this article, we introduce discrete distribution estimates with large numbers of equally spaced rate categories allowing us to investigate the appropriateness of the gamma model. With large numbers of rate categories, these discrete estimates are flexible enough to approximate the shape of almost any distribution. Likelihood ratio statistical tests and a nonparametric bootstrap confidence-bound estimation procedure based on the discrete estimates are presented that can be used to test the fit of a parametric family. We applied the methodology to several different protein data sets, and found that although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates. In cases when the gamma model assumption is in doubt, rate estimates coming from the discrete rate distribution estimate with a large number of rate categories provide a robust alternative to gamma estimates. An alternative implementation of the gamma distribution is proposed that, for equal numbers of rate categories, is computationally more efficient during optimization than the standard gamma implementation and can provide more accurate estimates of site rates.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Proteínas de Choque Térmico HSP70/genética , Funções Verossimilhança , Cadeias de Markov
19.
Nucleic Acids Res ; 31(14): 4227-37, 2003 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-12853641

RESUMO

A number of methods have recently been published that use phylogenetic information extracted from large multiple sequence alignments to detect sites that have changed properties in related protein families. In this study we use such methods to assess functional divergence between eukaryotic EF-1alpha (eEF-1alpha), archaebacterial EF-1alpha (aEF-1alpha) and two eukaryote-specific EF-1alpha paralogs-eukaryotic release factor 3 (eRF3) and Hsp70 subfamily B suppressor 1 (HBS1). Overall, the evolutionary modes of aEF-1alpha, HBS1 and eRF3 appear to significantly differ from that of eEF-1alpha. However, functionally divergent (FD) sites detected between aEF-1alpha and eEF-1alpha only weakly overlap with sites implicated as putative EF-1beta or aminoacyl-tRNA (aa-tRNA) binding residues in EF-1alpha, as expected based on the shared ancestral primary translational functions of these two orthologs. In contrast, FD sites detected between eEF-1alpha and its paralogs significantly overlap with the putative EF-1beta and/or aa-tRNA binding sites in EF-1alpha. In eRF3 and HBS1, these sites appear to be released from functional constraints, indicating that they bind neither eEF-1beta nor aa-tRNA. These results are consistent with experimental observations that eRF3 does not bind to aa-tRNA, but do not support the 'EF-1alpha-like' function recently proposed for HBS1. We re-assess the available genetic data for HBS1 in light of our analyses, and propose that this protein may function in stop codon-independent peptide release.


Assuntos
Células Eucarióticas/metabolismo , Fator 1 de Elongação de Peptídeos/genética , Sequência de Aminoácidos , Animais , Archaea/genética , Bactérias/genética , Sítios de Ligação/genética , DNA Complementar/química , DNA Complementar/genética , DNA de Protozoário/química , DNA de Protozoário/genética , Dictyostelium/genética , Diplomonadida/genética , Variação Genética , Giardia lamblia/genética , Dados de Sequência Molecular , Fator 1 de Elongação de Peptídeos/química , Filogenia , Conformação Proteica , Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de DNA , Homologia de Sequência de Aminoácidos , Trichomonas vaginalis/genética , Trypanosoma brucei brucei/genética
20.
Nucleic Acids Res ; 31(2): 790-7, 2003 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-12527789

RESUMO

Comparative sequence analysis has been used to study specific questions about the structure and function of proteins for many years. Here we propose a knowledge-based framework in which the maximum likelihood rate of evolution is used to quantify the level of constraint on the identity of a site. We demonstrate that site-rate mapping on 3D structures using datasets of rhodopsin-like G-protein receptors and alpha- and beta-tubulins provides an excellent tool for pinpointing the functional features shared between orthologous and paralogous proteins. In addition, functional divergence within protein families can be inferred by examining the differences in the site rates, the differences in the chemical properties of the side chains or amino acid usage between aligned sites. Two novel analytical methods are introduced to characterize rate- independent functional divergence. These are tested using a dataset of two classes of HMG-CoA reductases for which only one class can perform both the forward and reverse reaction. We show that functionally divergent sites occur in a cluster of sites interacting with the catalytic residues and that this information should facilitate the design of experimental strategies to directly test functional properties of residues.


Assuntos
Filogenia , Conformação Proteica , Proteínas/genética , Animais , Evolução Molecular , Proteínas de Ligação ao GTP/metabolismo , Variação Genética , Humanos , Hidroximetilglutaril-CoA Redutases/química , Hidroximetilglutaril-CoA Redutases/genética , Fosfopiruvato Hidratase/química , Fosfopiruvato Hidratase/genética , Proteínas/química , Receptores de Superfície Celular/química , Receptores de Superfície Celular/genética , Receptores de Superfície Celular/metabolismo , Rodopsina/química , Rodopsina/genética , Tubulina (Proteína)/química , Tubulina (Proteína)/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...