Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 39(39 Suppl 1): i357-i367, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387189

RESUMO

The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles (〈ϕ,ψ,χ1,χ2,…〉) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles (〈χ1,χ2,…〉) as a function of backbone 〈ϕ,ψ〉 conformations. A 'good' model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal (ϕψχal). AVAILABILITY AND IMPLEMENTATION: PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical.


Assuntos
Compressão de Dados , Bibliotecas , Aminoácidos , Biblioteca Gênica , Teoria da Informação
2.
Bioinformatics ; 38(Suppl 1): i255-i263, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758808

RESUMO

MOTIVATION: Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS: By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aminoácidos , Proteínas , Algoritmos , Sequência de Aminoácidos , Proteínas/química , Reprodutibilidade dos Testes , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
3.
Proteins ; 88(12): 1557-1558, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32662915

RESUMO

We have modeled modifications of a known ligand to the SARS-CoV-2 (COVID-19) protease, that can form a covalent adduct, plus additional ligand-protein hydrogen bonds.


Assuntos
Antivirais , Afídeos , Infecções por Coronavirus , Inseticidas , Pandemias , Pneumonia Viral , Acetilcolinesterase , Animais , Betacoronavirus , COVID-19 , Cisteína Endopeptidases , Humanos , Simulação de Acoplamento Molecular , Inibidores de Proteases , SARS-CoV-2 , Proteínas não Estruturais Virais
4.
Front Mol Biosci ; 7: 612920, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33996891

RESUMO

What is the architectural "basis set" of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures-called concepts-typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence-structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.

5.
Methods Mol Biol ; 1958: 123-131, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30945216

RESUMO

We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html .


Assuntos
Motivos de Aminoácidos , Biologia Computacional/métodos , Proteínas/química , Algoritmos , Teorema de Bayes , Compressão de Dados , Humanos , Modelos Moleculares , Dobramento de Proteína
6.
Bioinformatics ; 33(7): 1005-1013, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28065899

RESUMO

Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . Contact: arun.konagurthu@monash.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Compressão de Dados , Modelos Estatísticos , Proteínas/química , Alinhamento de Sequência , Algoritmos , Teorema de Bayes , Reprodutibilidade dos Testes , Software
7.
Acta Crystallogr D Biol Crystallogr ; 70(Pt 3): 904-6, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24598758

RESUMO

Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally reported to greater precision than the experimental structure determinations have actually achieved. By using information theory and data compression to study the compressibility of protein atomic coordinates, it is possible to quantify the amount of randomness in the coordinate data and thereby to determine the realistic precision of the reported coordinates. On average, the value of each C(α) coordinate in a set of selected protein structures solved at a variety of resolutions is good to about 0.1 Å.


Assuntos
Bases de Dados de Proteínas/normas , Interface Usuário-Computador , Cristalografia por Raios X/normas , Dicionários Químicos como Assunto , Espectroscopia de Ressonância Magnética/normas , Microscopia Eletrônica/normas , Valor Preditivo dos Testes , Distribuição Aleatória
8.
BMC Bioinformatics ; 14 Suppl 2: S7, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23368093

RESUMO

Gene expression profiles can show significant changes when genetically diseased cells are compared with non-diseased cells. Biological networks are often used to identify active subnetworks (ASNs) of the diseases from the expression profiles to understand the reason behind the observed changes. Current methodologies for discovering ASNs mostly use undirected PPI networks and node centric approaches. This can limit their ability to find the meaningful ASNs when using integrated networks having comprehensive information than the traditional protein-protein interaction networks. Using appropriate scoring functions to assess both genes and their interactions may allow the discovery of better ASNs. In this paper, we present CASNet, which aims to identify better ASNs using (i) integrated interaction networks (mixed graphs), (ii) directions of regulations of genes, and (iii) combined node and edge scores. We simplify and extend previous methodologies to incorporate edge evaluations and lessen their sensitivity to significance thresholds. We formulate our objective functions using mixed integer programming (MIP) and show that optimal solutions may be obtained. We compare the ASNs obtained by CASNet and similar other approaches to show that CASNet can often discover more meaningful and stable regulatory ASNs. Our analysis of a breast cancer dataset finds that the positive feedback loops across 7 genes, AR, ESR1, MYC, E2F2, PGR, BCL2 and CCND1 are conserved across the basal/triple negative subtypes in multiple datasets that could potentially explain the aggressive nature of this cancer subtype. Furthermore, comparison of the basal subtype of breast cancer and the mesenchymal subtype of glioblastoma ASNs shows that an ASN in the vicinity of IL6 is conserved across the two subtypes. This result suggests that subtypes of different cancers can show molecular similarities indicating that the therapeutic approaches in different types of cancers may be shared.


Assuntos
Neoplasias da Mama/genética , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Simulação por Computador , Feminino , Regulação Neoplásica da Expressão Gênica , Glioblastoma/genética , Humanos , Mapas de Interação de Proteínas
9.
Bioinformatics ; 27(23): 3315-6, 2011 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-21994221

RESUMO

SUMMARY: Protein topology diagrams are 2D representations of protein structure that are particularly useful in understanding and analysing complex protein folds. Generating such diagrams presents a major problem in graph drawing, with automatic approaches often resulting in errors or uninterpretable results. Here we apply a breakthrough in diagram layout to protein topology cartoons, providing clear, accurate, interactive and editable diagrams, which are also an interface to a structural search method. AVAILABILITY: Pro-origami is available via a web server at http://munk.csse.unimelb.edu.au/pro-origami CONTACT: a.stivala@pgrad.unimelb.edu.au; pjs@csse.unimelb.edu.au.


Assuntos
Modelos Moleculares , Proteínas/química , Automação , Estrutura Secundária de Proteína , Software
10.
Bioinformatics ; 27(13): i43-51, 2011 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-21685100

RESUMO

UNLABELLED: Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure. AVAILABILITY: http://www.csse.monash.edu.au/~karun/pmml.


Assuntos
Algoritmos , Proteínas/química , Clostridium/química , Ligação de Hidrogênio , Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína , Proteínas/metabolismo
11.
BMC Bioinformatics ; 11: 446, 2010 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-20813068

RESUMO

BACKGROUND: Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching. RESULTS: We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU). CONCLUSIONS: The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.


Assuntos
Simulação por Computador , Proteínas/química , Gráficos por Computador , Bases de Dados de Proteínas , Estrutura Terciária de Proteína
12.
PLoS One ; 5(4): e10048, 2010 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-20386610

RESUMO

BACKGROUND: A central tenet of structural biology is that related proteins of common function share structural similarity. This has key practical consequences for the derivation and analysis of protein structures, and is exploited by the process of "molecular sieving" whereby a common core is progressively distilled from a comparison of two or more protein structures. This paper reports a novel web server for "sieving" of protein structures, based on the multiple structural alignment program MUSTANG. METHODOLOGY/PRINCIPAL FINDINGS: "Sieved" models are generated from MUSTANG-generated multiple alignment and superpositions by iteratively filtering out noisy residue-residue correspondences, until the resultant correspondences in the models are optimally "superposable" under a threshold of RMSD. This residue-level sieving is also accompanied by iterative elimination of the poorly fitting structures from the input ensemble. Therefore, by varying the thresholds of RMSD and the cardinality of the ensemble, multiple sieved models are generated for a given multiple alignment and superposition from MUSTANG. To aid the identification of structurally conserved regions of functional importance in an ensemble of protein structures, Lesk-Hubbard graphs are generated, plotting the number of residue correspondences in a superposition as a function of its corresponding RMSD. The conserved "core" (or typically active site) shows a linear trend, which becomes exponential as divergent parts of the structure are included into the superposition. CONCLUSIONS: The application addresses two fundamental problems in structural biology: first, the identification of common substructures among structurally related proteins--an important problem in characterization and prediction of function; second, generation of sieved models with demonstrated uses in protein crystallographic structure determination using the technique of Molecular Replacement.


Assuntos
Biologia Computacional/métodos , Software , Homologia Estrutural de Proteína , Algoritmos , Cristalografia por Raios X , Proteínas/química
13.
Bioinformatics ; 26(2): 161-7, 2010 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-19933823

RESUMO

MOTIVATION: Cancer evolves through microevolution where random lesions that provide the biggest advantage to cancer stand out in their frequent occurrence in multiple samples. At the same time, a gene function can be changed by aberration of the corresponding gene or modification of microRNA (miRNA) expression, which attenuates the gene. In a large number of cancer samples, these two mechanisms might be distributed in a coordinated and almost mutually exclusive manner. Understanding this coordination may assist in identifying changes which significantly produce the same functional impact on cancer phenotype, and further identify genes that are universally required for cancer. Present methodologies for finding aberrations usually analyze single datasets, which cannot identify such pairs of coordinating genes and miRNAs. RESULTS: We have developed MIRAGAA, a statistical approach, to assess the coordinated changes of genome copy numbers and miRNA expression. We have evaluated MIRAGAA on The Cancer Genome Atlas (TCGA) Glioblastoma Multiforme datasets. In these datasets, a number of genome regions coordinating with different miRNAs are identified. Although well known for their biological significance, these genes and miRNAs would be left undetected for being less significant if the two datasets were analyzed individually. AVAILABILITY AND IMPLEMENTATION: The source code, implemented in R and java, is available from our project web site at http://www.csse.unimelb.edu.au/~rgaire/MIRAGAA/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Genômica/métodos , MicroRNAs/metabolismo , Neoplasias/genética , Software , Biologia Computacional/métodos , Dosagem de Genes , Glioblastoma/genética , Humanos , Neoplasias/metabolismo
14.
BMC Bioinformatics ; 10: 153, 2009 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-19450287

RESUMO

BACKGROUND: Searching for proteins that contain similar substructures is an important task in structural biology. The exact solution of most formulations of this problem, including a recently published method based on tableaux, is too slow for practical use in scanning a large database. RESULTS: We developed an improved method for detecting substructural similarities in proteins using tableaux. Tableaux are compared efficiently by solving the quadratic program (QP) corresponding to the quadratic integer program (QIP) formulation of the extraction of maximally-similar tableaux. We compare the accuracy of the method in classifying protein folds with some existing techniques. CONCLUSION: We find that including constraints based on the separation of secondary structure elements increases the accuracy of protein structure search using maximally-similar subtableau extraction, to a level where it has comparable or superior accuracy to existing techniques. We demonstrate that our implementation is able to search a structural database in a matter of hours on a standard PC.


Assuntos
Estrutura Secundária de Proteína , Proteínas/química , Proteômica/métodos , Software , Área Sob a Curva , Bases de Dados de Proteínas , Modelos Moleculares , Modelos Estatísticos , Curva ROC
15.
Bioinformatics ; 24(5): 645-51, 2008 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-18175768

RESUMO

UNLABELLED: Comparison and classification of folding patterns from a database of protein structures is crucial to understand the principles of protein architecture, evolution and function. Current search methods for proteins with similar folding patterns are slow and computationally intensive. The sharp growth in the number of known protein structures poses severe challenges for methods of structural comparison. There is a need for methods that can search the database of structures accurately and rapidly. We provide several methods to search for similar folding patterns using a concise tableau representation of proteins that encodes the relative geometry of secondary structural elements. Our first approach allows the extraction of identical and very closely-related protein folding patterns in constant-time (per hit). Next, we address the hard computational problem of extraction of maximally-similar subtableaux, when comparing two tableaux. We solve the problem using Quadratic and Linear integer programming formulations and demonstrate their power to identify subtle structural similarities, especially when protein structures significantly diverge. Finally, we describe a rapid and accurate method for comparing a query structure against a database of protein domains, TableauSearch. TableauSearch is rapid enough to search the entire structural database in seconds on a standard desktop computer. Our analysis of TableauSearch on many queries shows that the method is very accurate in identifying similarities of folding patterns, even between distantly related proteins. AVAILABILITY: A web server implementing the TableauSearch is available from http://hollywood.bx.psu.edu/TabSearch.


Assuntos
Armazenamento e Recuperação da Informação , Dobramento de Proteína , Estrutura Secundária de Proteína , Modelos Moleculares
16.
Proteins ; 64(3): 559-74, 2006 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-16736488

RESUMO

Multiple structural alignment is a fundamental problem in structural genomics. In this article, we define a reliable and robust algorithm, MUSTANG (MUltiple STructural AligNment AlGorithm), for the alignment of multiple protein structures. Given a set of protein structures, the program constructs a multiple alignment using the spatial information of the C(alpha) atoms in the set. Broadly based on the progressive pairwise heuristic, this algorithm gains accuracy through novel and effective refinement phases. MUSTANG reports the multiple sequence alignment and the corresponding superposition of structures. Alignments generated by MUSTANG are compared with several handcurated alignments in the literature as well as with the benchmark alignments of 1033 alignment families from the HOMSTRAD database. The performance of MUSTANG was compared with DALI at a pairwise level, and with other multiple structural alignment tools such as POSA, CE-MC, MALECON, and MultiProt. MUSTANG performs comparably to popular pairwise and multiple structural alignment tools for closely related proteins, and performs more reliably than other multiple structural alignment methods on hard data sets containing distantly related proteins or proteins that show conformational changes.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Globinas/química , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes , Homologia de Sequência de Aminoácidos , Serina Endopeptidases/química , Software , Homologia Estrutural de Proteína
17.
J Comput Biol ; 13(3): 668-85, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16706718

RESUMO

Alignment of sequences is an important routine in various areas of science, notably molecular biology. Multiple sequence alignment is a computationally hard optimization problem which involves the consideration of different possible alignments in order to find an optimal one, given a measure of goodness of alignments. Dynamic programming algorithms are generally well suited for the search of optimal alignments, but are constrained by unwieldy space requirements for large numbers of sequences. Carrillo and Lipman devised a method that helps to reduce the search space for an optimal alignment under a sum-of-pairs measure using bounds on the scores of its pairwise projections. In this paper, we generalize Carrillo and Lipman bounds and demonstrate a novel approach for finding optimal sum-of-pairs multiple alignments that allows incremental pruning of the optimal alignment search space. This approach can result in a drastic pruning of the final search space polytope (where we search for the optimal alignment) when compared to Carrillo and Lipman's approach and hence allows many runs that are not feasible with the original method.


Assuntos
Algoritmos , Alinhamento de Sequência , Análise de Sequência de Proteína
18.
J Bioinform Comput Biol ; 2(4): 719-45, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15617163

RESUMO

In this paper we demonstrate a practical approach to construct progressive multiple alignments using sequence triplet optimizations rather than a conventional pairwise approach. Using the sequence triplet alignments progressively provides a scope for the synthesis of a three-residue exchange amino acid substitution matrix. We develop such a 20 x 20 x 20 matrix for the first time and demonstrate how its use in optimal sequence triplet alignments increases the sensitivity of building multiple alignments. Various comparisons were made between alignments generated using the progressive triplet methods and the conventional progressive pairwise procedure. The assessment of these data reveal that, in general, the triplet based approaches generate more accurate sequence alignments than the traditional pairwise based procedures, especially between more divergent sets of sequences.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Aminoácidos/análise , Aminoácidos/química , Modelos Estatísticos , Dados de Sequência Molecular , Conformação Proteica , Proteínas/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...