Pesquisa | Portal Regional da BVS

CollHaps: a heuristic approach to haplotype inference by parsimony.

Tininini, Leonardo; Bertolazzi, Paola; Godi, Alessandra; Lancia, Giuseppe.

IEEE/ACM Trans Comput Biol Bioinform ; 7(3): 511-23, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20671321

RESUMO

Haplotype data play a relevant role in several genetic studies, e.g., mapping of complex disease genes, drug design, and evolutionary studies on populations. However, the experimental determination of haplotypes is expensive and time-consuming. This motivates the increasing interest in techniques for inferring haplotype data from genotypes, which can instead be obtained quickly and economically. Several such techniques are based on the maximum parsimony principle, which has been justified by both experimental results and theoretical arguments. However, the problem of haplotype inference by parsimony was shown to be NP-hard, thus limiting the applicability of exact parsimony-based techniques to relatively small data sets. In this paper, we introduce collapse rule, a generalization of the well-known Clark's rule, and describe a new heuristic algorithm for haplotype inference (implemented in a program called CollHaps), based on parsimony and the iterative application of collapse rules. The performance of CollHaps is tested on several data sets. The experiments show that CollHaps enables the user to process large data sets obtaining very "parsimonious" solutions in short processing times. They also show a correlation, especially for large data sets, between parsimony and correct reconstruction, supporting the validity of the parsimony principle to produce accurate solutions.

Assuntos

Algoritmos , Biologia Computacional/métodos , Haplótipos/genética , Software , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos

Haplotyping for disease association: a combinatorial approach.

Lancia, Giuseppe; Ravi, R; Rizzi, Romeo.

IEEE/ACM Trans Comput Biol Bioinform ; 5(2): 245-51, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18451433

RESUMO

We consider a combinatorial problem derived from haplotyping a population with respect to a genetic disease, either recessive or dominant. Given a set of individuals, partitioned into healthy and diseased, and the corresponding sets of genotypes, we want to infer "bad'' and "good'' haplotypes to account for these genotypes and for the disease. Assume e.g. the disease is recessive. Then, the resolving haplotypes must consist of bad and good haplotypes, so that (i) each genotype belonging to a diseased individual is explained by a pair of bad haplotypes and (ii) each genotype belonging to a healthy individual is explained by a pair of haplotypes of which at least one is good. We prove that the associated decision problem is NP-complete. However, we also prove that there is a simple solution, provided the data satisfy a very weak requirement.

Assuntos

Doenças Genéticas Inatas/genética , Haplótipos/genética , Modelos Genéticos , Biologia Computacional , Feminino , Predisposição Genética para Doença , Genótipo , Humanos , Masculino , Matemática , Polimorfismo de Nucleotídeo Único

The approximability of the String Barcoding problem.

Lancia, Giuseppe; Rizzi, Romeo.

Algorithms Mol Biol ; 1: 12, 2006 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-16895600

RESUMO

The String Barcoding (SBC) problem, introduced by Rash and Gusfield (RECOMB, 2002), consists in finding a minimum set of substrings that can be used to distinguish between all members of a set of given strings. In a computational biology context, the given strings represent a set of known viruses, while the substrings can be used as probes for an hybridization experiment via microarray. Eventually, one aims at the classification of new strings (unknown viruses) through the result of the hybridization experiment. In this paper we show that SBC is as hard to approximate as Set Cover. Furthermore, we show that the constrained version of SBC (with probes of bounded length) is also hard to approximate. These negative results are tight.

1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap.

Caprara, Alberto; Carr, Robert; Istrail, Sorin; Lancia, Giuseppe; Walenz, Brian.

J Comput Biol ; 11(1): 27-52, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-15072687

RESUMO

Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.

Assuntos

Bases de Dados de Proteínas , Modelos Moleculares , Proteínas/química , Alinhamento de Sequência , Software , Algoritmos , Simulação por Computador , Conformação Proteica

Haplotyping as perfect phylogeny: a direct approach.

Bafna, Vineet; Gusfield, Dan; Lancia, Giuseppe; Yooseph, Shibu.

J Comput Biol ; 10(3-4): 323-40, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-12935331

RESUMO

A full haplotype map of the human genome will prove extremely valuable as it will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A haplotype map project has been announced by NIH. The biological key to that project is the surprising fact that some human genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). In this paper we explore the algorithmic implications of the no-recombination in long blocks observation, for the problem of inferring haplotypes in populations. This assumption, together with the standard population-genetic assumption of infinite sites, motivates a model of haplotype evolution where the haplotypes in a population are assumed to evolve along a coalescent, which as a rooted tree is a perfect phylogeny. We consider the following algorithmic problem, called the perfect phylogeny haplotyping problem (PPH), which was introduced by Gusfield (2002) - given n genotypes of length m each, does there exist a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set, and such that this set can be derived on a perfect phylogeny? The approach taken by Gusfield (2002) to solve this problem reduces it to established, deep results and algorithms from matroid and graph theory. Although that reduction is quite simple and the resulting algorithm nearly optimal in speed, taken as a whole that approach is quite involved, and in particular, challenging to program. Moreover, anyone wishing to fully establish, by reading existing literature, the correctness of the entire algorithm would need to read several deep and difficult papers in graph and matroid theory. However, as stated by Gusfield (2002), many simplifications are possible and the list of "future work" in Gusfield (2002) began with the task of developing a simpler, more direct, yet still efficient algorithm. This paper accomplishes that goal, for both the rooted and unrooted PPH problems. It establishes a simple, easy-to-program, O(nm(2))-time algorithm that determines whether there is a PPH solution for input genotypes and produces a linear-space data structure to represent all of the solutions. The approach allows complete, self-contained proofs. In addition to algorithmic simplicity, the approach here makes the representation of all solutions more intuitive than in Gusfield (2002), and solves another goal from that paper, namely, to prove a nontrivial upper bound on the number of PPH solutions, showing that that number is vastly smaller than the number of haplotype solutions (each solution being a set of n pairs of haplotypes that can generate the genotypes) when the perfect phylogeny requirement is not imposed.

Assuntos

Haplótipos , Filogenia , Algoritmos , Interpretação Estatística de Dados , Predisposição Genética para Doença

Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem.

Lippert, Ross; Schwartz, Russell; Lancia, Giuseppe; Istrail, Sorin.

Brief Bioinform ; 3(1): 23-31, 2002 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-12002221

RESUMO

With the consensus human genome sequenced and many other sequencing projects at varying stages of completion, greater attention is being paid to the genetic differences among individuals and the abilities of those differences to predict phenotypes. A significant obstacle to such work is the difficulty and expense of determining haplotypes--sets of variants genetically linked because of their proximity on the genome--for large numbers of individuals for use in association studies. This paper presents some algorithmic considerations in a new approach for haplotype determination: inferring haplotypes from localised polymorphism data gathered from short genome 'fragments.' Formalised models of the biological system under consideration are examined, given a variety of assumptions about the goal of the problem and the character of optimal solutions. Some theoretical results and algorithms for handling haplotype assembly given the different models are then sketched. The primary conclusion is that some important simplified variants of the problem yield tractable problems while more general variants tend to be intractable in the worst case.

Assuntos

Algoritmos , Haplótipos , Polimorfismo Conformacional de Fita Simples , Sequência de Bases , DNA , Modelos Teóricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA