Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Comput ; 16(2): 261-284, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28690474

RESUMO

A major goal of natural computing is to design biomolecules, such as nucleic acid sequences, that can be used to perform computations. We design sequences of nucleic acids that are "guaranteed" to have long folding pathways relative to their length. This particular sequences with high probability follow low-barrier folding pathways that visit a large number of distinct structures. Long folding pathways are interesting, because they demonstrate that natural computing can potentially support long and complex computations. Formally, we provide the first scalable designs of molecules whose low-barrier folding pathways, with respect to a simple, stacked pair energy model, grow superlinearly with the molecule length, but for which all significantly shorter alternative folding pathways have an energy barrier that is [Formula: see text] times that of the low-barrier pathway for any [Formula: see text] and a sufficiently long sequence.

2.
Artigo em Inglês | MEDLINE | ID: mdl-26887011

RESUMO

Genome mapping algorithms aim at computing an ordering of a set of genomic markers based on local ordering information such as adjacencies and intervals of markers. In most genome mapping models, markers are assumed to occur uniquely in the resulting map. We introduce algorithmic questions that consider repeats, i.e., markers that can have several occurrences in the resulting map. We show that, provided with an upper bound on the copy number of repeated markers and with intervals that span full repeat copies, called repeat spanning intervals, the problem of deciding if a set of adjacencies and repeat spanning intervals admits a genome representation is tractable if the target genome can contain linear and/or circular chromosomal fragments. We also show that extracting a maximum cardinality or weight subset of repeat spanning intervals given a set of adjacencies that admits a genome realization is NP-hard but fixed-parameter tractable in the maximum copy number and the number of adjacent repeats, and tractable if intervals contain a single repeated marker.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genômica/métodos , Sequências Repetitivas de Ácido Nucleico/genética , Software
3.
Nat Comput ; 13(4): 499-516, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25400535

RESUMO

Chemical reaction networks (CRNs) and DNA strand displacement systems (DSDs) are widely-studied and useful models of molecular programming. However, in order for some DSDs in the literature to behave in an expected manner, the initial number of copies of some reagents is required to be fixed. In this paper we show that, when multiple copies of all initial molecules are present, general types of CRNs and DSDs fail to work correctly if the length of the shortest sequence of reactions needed to produce any given molecule exceeds a threshold that grows polynomially with attributes of the system.

4.
Interface Focus ; 2(4): 512-21, 2012 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-22649584

RESUMO

We study the potential for molecule recycling in chemical reaction systems and their DNA strand displacement realizations. Recycling happens when a product of one reaction is a reactant in a later reaction. Recycling has the benefits of reducing consumption, or waste, of molecules and of avoiding fuel depletion. We present a binary counter that recycles molecules efficiently while incurring just a moderate slowdown compared with alternative counters that do not recycle strands. This counter is an n-bit binary reflecting Gray code counter that advances through 2(n) states. In the strand displacement realization of this counter, the waste-total number of nucleotides of the DNA strands consumed-is polynomial in n, the number of bits of the counter, while the waste of alternative counters grows exponentially in n. We also show that our n-bit counter fails to work correctly when many (Θ(n)) copies of the species that represent the bits of the counter are present initially. The proof applies more generally to show that in chemical reaction systems where all but one reactant of each reaction are catalysts, computations longer than a polynomial function of the size of the system are not possible when there are polynomially many copies of the system present.

5.
J Comput Biol ; 19(4): 439-54, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22468709

RESUMO

The problem of determining haplotypes from genotypes has gained considerable prominence in the research community. Here the focus is on determining sets of SNP values on individual chromosomes since such information captures the genetic causes of diseases. The most efficient algorithmic tool for haplotyping is based on perfect phylogenetic trees. A drawback of this method is that it cannot be applied in situations when the data contains homoplasies (multiple mutations of the same character) or recombinations. Recently, Song et al. ( 2005 ) studied the two cases: haplotyping via imperfect phylogenies with a single homoplasy and via galled-tree networks with one gall. In Gupta et al. ( 2010 ), we have shown that the haplotyping via galled-tree networks is NP-hard, even if we restrict to the case when every gall contains at most 3 mutations. We present a polynomial algorithm for haplotyping via galled-tree networks with simple galls (each having two mutations) for genotype matrices which satisfy a natural condition which is implied by presence of at least one 1 in each column that contains a 2. In the end, we give the experimental results comparing our algorithm with PHASE on simulated data.


Assuntos
Algoritmos , Haplótipos , Filogenia , Biologia Computacional/métodos , Simulação por Computador , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único
6.
BMC Bioinformatics ; 13 Suppl 19: S11, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23281593

RESUMO

BACKGROUND: Recovering the structure of ancestral genomes can be formalized in terms of properties of binary matrices such as the Consecutive-Ones Property (C1P). The Linearization Problem asks to extract, from a given binary matrix, a maximum weight subset of rows that satisfies such a property. This problem is in general intractable, and in particular if the ancestral genome is expected to contain only linear chromosomes or a unique circular chromosome. In the present work, we consider a relaxation of this problem, which allows ancestral genomes that can contain several chromosomes, each either linear or circular. RESULT: We show that, when restricted to binary matrices of degree two, which correspond to adjacencies, the genomic characters used in most ancestral genome reconstruction methods, this relaxed version of the Linearization Problem is polynomially solvable using a reduction to a matching problem. This result holds in the more general case where columns have bounded multiplicity, which models possibly duplicated ancestral genes. We also prove that for matrices with rows of degrees 2 and 3, without multiplicity and without weights on the rows, the problem is NP-complete, thus tracing sharp tractability boundaries. CONCLUSION: As it happened for the breakpoint median problem, also used in ancestral genome reconstruction, relaxing the definition of a genome turns an intractable problem into a tractable one. The relaxation is adapted to some biological contexts, such as bacterial genomes with several replicons, possibly partially assembled. Algorithms can also be used as heuristics for hard variants. More generally, this work opens a way to better understand linearization results for ancestral genome structure inference.


Assuntos
Cromossomos/genética , Evolução Molecular , Genoma/genética , Algoritmos , Genômica/métodos
7.
Proteome Sci ; 9 Suppl 1: S6, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-22165948

RESUMO

BACKGROUND: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses. Defective signal transduction underlies many pathologies, including cancer, diabetes, autoimmunity and about 400 other human diseases. Therefore, there is high impetus to define the composition and architecture of cellular communications networks in humans. The major components of intracellular signaling networks are protein kinases and protein phosphatases, which catalyze the reversible phosphorylation of proteins. Here, we have focused on identification of kinase-substrate interactions through prediction of the phosphorylation site specificity from knowledge of the primary amino acid sequence of the catalytic domain of each kinase. RESULTS: The presented method predicts 488 different kinase catalytic domain substrate specificity matrices in 478 typical and 4 atypical human kinases that rely on both positive and negative determinants for scoring individual phosphosites for their suitability as kinase substrates. This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction. Comparison of our predicted matrices with experimentally-derived matrices from about 9,000 known kinase-phosphosite substrate pairs revealed a high degree of concordance with the established preferences of about 150 well studied protein kinases. Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates. CONCLUSIONS: Application of this improved kinase substrate prediction algorithm to the primary structures of over 23, 000 proteins encoded by the human genome has permitted the identification of about 650, 000 putative phosphosites, which are posted on the open source PhosphoNET website (http://www.phosphonet.ca).

8.
J Comput Biol ; 18(9): 1023-39, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21899413

RESUMO

In comparative genomics, differences or similarities of gene orders are determined to predict functional relations of genes or phylogenetic relations of genomes. For this purpose, various combinatorial models can be used to specify gene clusters--groups of genes that are co-located in a set of genomes. Several approaches have been proposed to reconstruct putative ancestral gene clusters based on the gene order of contemporary species. One prevalent and natural reconstruction criterion is consistency: For a set of reconstructed gene clusters, there should exist a gene order that comprises all given clusters. For permutation-based gene cluster models, efficient methods exist to verify this condition. In this article, we discuss the consistency problem for different gene cluster models on sequences with restricted gene multiplicities. Our results range from linear-time algorithms for the simple model of adjacencies to NP-completeness proofs for more complex models like common intervals.


Assuntos
Modelos Genéticos , Família Multigênica , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Simulação por Computador , Evolução Molecular
9.
J Comput Biol ; 18(9): 1243-53, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21899429

RESUMO

The Gapped Consecutive-Ones Property (C1P) Problem, or the (k, δ)-C1P Problem is: given a binary matrix M and integers k and δ, decide if the columns of M can be ordered such that each row contains at most k blocks of 1's, and no two neighboring blocks of 1's are separated by a gap of more than δ 0's. This problem was introduced by Chauve et al. ( 2009b ). The classical polynomial-time solvable C1P Problem is equivalent to the (1, 0)-C1P problem. It has been shown that, for every unbounded or bounded k ≥ 2 and unbounded or bounded δ ≥ 1, except when (k, δ) = (2, 1), the (k, δ)-C1P Problem is NP-complete (Manuch et al., 2011 ; Goldberg et al., 1995 ). In this article, we study the Gapped C1P Problem with a third parameter d, namely the bound on the maximum number of 1's in any row of M, or the bound on the maximum degree of M. This is motivated by the reconstruction of ancestral genomes (Ma et al., 2006 ; Chauve and Tannier, 2008 ), where, in binary matrices obtained from the experiments of Chauve and Tannier ( 2008 ), we have observed that the majority of the rows have low degree, while each high degree row contains many rows of low degree. The (d, k, δ)-C1P Problem has been shown to be polynomial-time solvable when all three parameters are fixed (Chauve et al., 2009b ). Since fixing d also fixes k (k ≤ d), the only case left to consider is the case when δ is unbounded, or the (d, k, ∞)-C1P Problem. Here we show that for every d > k ≥ 2, the (d, k, ∞)-C1P Problem is NP-complete.


Assuntos
Modelos Genéticos , Modelos Estatísticos , Algoritmos , Simulação por Computador , Evolução Molecular , Mutação
10.
J Comput Biol ; 17(10): 1435-49, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20937016

RESUMO

The problem of determining haplotypes from genotypes has gained considerable prominence in the research community since the beginning of the HapMap project. Here the focus is on determining the sets of SNP values of individual chromosomes (haplotypes), since such information better captures the genetic causes of diseases. One of the main algorithmic tools for haplotyping is based on the assumption that the evolutionary history for the original haplotypes satisfies perfect phylogeny. This tool can be applied only on individual blocks of chromosomes, in which it is assumed that recombinations do not happen. However, exact determination of blocks is usually not possible. It would be desirable to develop a method for haplotyping which can account for recombinations, and thus can be applied on multiblock sections of chromosomes. A natural candidate for such a method is haplotyping via phylogenetic networks (which model recombinations) or their simplified version: galled-tree networks. However, even haplotyping via galled-tree networks appears hard, as the efficient algorithms exist only for very special cases: the galled-tree network has either a single gall or only small galls with two mutations each. Building on our previous results, we show that, in general, haplotyping via galled-tree networks is NP-complete, and it remains NP-complete when galls are allowed to have at most k mutations, for any k ≥ 3.


Assuntos
Algoritmos , Haplótipos , Modelos Genéticos , Filogenia , Evolução Biológica , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único
11.
J Comput Biol ; 17(6): 841-52, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20500097

RESUMO

Using the Tile Assembly Model proposed by Rothemund and Winfree, we give two lower bounds on the minimum number of tile types needed to uniquely assemble a shape at temperature 1 under a natural assumption that there are no binding domain mismatches (any two adjacent tiles either form a bond or else both touching sides of the tiles are without glues). Rothemund and Winfree showed that uniquely assembling a full N x N square (a square where there is a bond between any two adjacent tiles) at temperature 1 requires N(2) distinct tile types, and conjectured that the minimum number of tile types needed to self-assemble an N x N square (not a full square) is 2N - 1. Our lower bounds imply that a tile system that uniquely assembles an N x N square without binding domains mismatches, requires at least 2N - 1 tile types.


Assuntos
Modelos Moleculares , Temperatura , DNA/química
12.
Pac Symp Biocomput ; : 108-19, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-19908363

RESUMO

We make two new contributions to the problem of calculating pseudoknot-free folding pathways with minimum energy barrier between pairs (Α,Β) of RNA secondary structures. Our first contribution pertains to a problem posed by Morgan and Higgs: find a min-barrier direct folding pathway for a simple energy model in which each base pair contributes -1. In a direct folding pathway, intermediate structures contain only base pairs in Α and Β and are of length |AΔB| (the size of the symmetric difference of the two structures). We show how to solve this problem exactly, using techniques for deconstructing bipartite graphs. The problem is NP-hard and so our algorithm requires exponential time in the worst case but performs quite well empirically on pairs of structures that are hundreds of nucleotides long. Our second contribution shows that for the simple energy model, repeatedly adding or removing a base pair from A U B along a pathway is not useful in minimizing the energy barrier. Two consequences of this result are that (i) the problem of determining the min-barrier pseudoknot-free folding pathway from the space of direct pathways with repeats is NP-hard and (ii) our new algorithm finds the min-barrier pathway not only from the space of direct folding pathways but in fact from the space of direct pathways with repeats.


Assuntos
Algoritmos , Conformação de Ácido Nucleico , RNA/química , Biologia Computacional , Modelos Moleculares , Termodinâmica
13.
J Comput Biol ; 16(6): 769-802, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19522663

RESUMO

The inverse protein folding problem is that of designing an amino acid sequence which has a prescribed native protein fold. This problem arises in drug design where a particular structure is necessary to ensure proper protein-protein interactions. Previously, tubular structures for a three-dimensional (3D) hexagonal prism lattice were introduced and their stability was formally proved for simple instances under the hydrophobic-polar (HP) model of Dill. In this article, we generalize the design of tubular structures to allow for much larger variety of designable structures by allowing branching of tubes. Our generalized design could be used to roughly approximate given 3D shapes in the considered lattice. Although the generalized tubular structures are not stable under the HP model, we can prove that a simple instance of generalized tubular structures is structurally stable (all native folds have the designed shape) under a refined version of the HP model, called the HPC model. We conjecture that there is a way to choose which hydrophobic monomers are cysteines in all generalized tubular structures such that the designed proteins are structurally stable under the HPC model.


Assuntos
Interações Hidrofóbicas e Hidrofílicas , Modelos Químicos , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo
14.
J Comput Biol ; 16(1): 19-30, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19072580

RESUMO

The inverse protein folding problem is that of designing an amino acid sequence which folds into a prescribed conformation/structure. This problem arises in drug design where a particular structure is necessary to ensure proper protein-protein interactions. Gupta et al. (2005) introduced a design in the two-dimensional (2D) hydrophobic-polar (HP) model of Dill that can be used to approximate any given (2D) shape. They conjectured that the protein sequences of their design are stable but only proved the stability for an infinite class of very basic structures. We introduce a refinement of the HP model, in which the cysteine and non-cysteine hydrophobic monomers are distinguished and SS-bridges, which two cysteines can form, are taken into account in the energy function. We call this model the HPC model. We consider a subclass of linear structures designed in Gupta et al. (2005) which is rich enough to approximate (although more coarsely) any given structure. We refine these structures for the HPC model by setting approximately a half of H amino acids to cysteine ones and call them snake structures. We first prove that the proteins of the snake structures are stable under the strong HPC model in which we make an additional assumption that non-cysteine amino acids act as cysteine ones, i.e., they can form their own bridges to reduce the energy. Then we consider a subclass of snake structures called wave structures that can still approximate any given shape and prove that their proteins are stable under the proper HPC model. This partially confirms the conjecture stated in Gupta et al. (2005). To prove the above results we developed a computational tool, called 2DHPSolver, which we used to perform large case analysis required for the proofs. We conjecture that the proteins of snake structures are stable under the proper HPC model.


Assuntos
Simulação por Computador , Cisteína/química , Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína , Sequência de Aminoácidos , Interações Hidrofóbicas e Hidrofílicas
15.
J Bioinform Comput Biol ; 6(1): 93-106, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18324748

RESUMO

It is known that folding a protein chain into a cubic lattice is an NP-complete problem. We consider a seemingly easier problem: given a three-dimensional (3D) fold of a protein chain (coordinates of its C(alpha) atoms), we want to find the closest lattice approximation of this fold. This problem has been studied under names such as "lattice approximation of a protein chain", "the protein chain fitting problem", and "building of protein lattice models". We show that this problem is NP-complete for the cubic lattice with side close to 3.8 A and coordinate root mean square deviation.


Assuntos
Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Dados de Sequência Molecular , Conformação Proteica
16.
J Bioinform Comput Biol ; 4(6): 1309-28, 2006 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17245816

RESUMO

In this paper, we give a complete characterization of the existence of a galled-tree network in the form of simple sufficient and necessary conditions for both root-known and root-unknown cases. As a by-product we obtain a simple algorithm for constructing galled-tree networks. We also introduce a new necessary condition for the existence of a galled-tree network similar to bi-convexity.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Frequência do Gene , Haplótipos/genética , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Simulação por Computador
17.
J Comput Biol ; 12(10): 1328-45, 2005 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-16379538

RESUMO

The inverse protein folding problem is that of designing an amino acid sequence which has a particular native protein fold. This problem arises in drug design where a particular structure is necessary to ensure proper protein-protein interactions. In this paper, we show that in the 2D HP model of Dill it is possible to solve this problem for a broad class of structures. These structures can be used to closely approximate any given structure. One of the most important properties of a good protein (in drug design) is its stability--the aptitude not to fold simultaneously into other structures. We show that for a number of basic structures, our sequences have a unique fold.


Assuntos
Dobramento de Proteína , Biologia Computacional , Desenho de Fármacos , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Conformação Proteica , Termodinâmica
18.
Artigo em Inglês | MEDLINE | ID: mdl-16448024

RESUMO

The inverse protein folding problem is that of designing an amino acid sequence which has a particular native protein fold. This problem arises in drug design where a particular structure is necessary to ensure proper protein-protein interactions. In this paper we show that in the 2D HP model of Dill it is possible to solve this problem for a broad class of structures. These structures can be used to closely approximate any given structure. One of the most important properties of a good protein is its stability -- the aptitude not to fold simultanously into other structures. We show that for a number of basic structures, our sequences have a unique fold.


Assuntos
Algoritmos , Desenho de Fármacos , Modelos Químicos , Modelos Moleculares , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Simulação por Computador , Ligação de Hidrogênio , Conformação Proteica , Desnaturação Proteica , Dobramento de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...