Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Bioinformatics ; 13 Suppl 19: S9, 2012.
Article in English | MEDLINE | ID: mdl-23282028

ABSTRACT

Many cancer genome sequencing efforts are underway with the goal of identifying the somatic mutations that drive cancer progression. A major difficulty in these studies is that tumors are typically heterogeneous, with individual cells in a tumor having different complements of somatic mutations. However, nearly all DNA sequencing technologies sequence DNA from multiple cells, thus resulting in measurement of mutations from a mixture of genomes. Genome rearrangements are a major class of somatic mutations in many tumors, and the novel adjacencies (i.e. breakpoints) resulting from these rearrangements are readily detected from DNA sequencing reads. However, the assignment of each rearrangement, or adjacency, to an individual cancer genome in the mixture is not known. Moreover, the quantity of DNA sequence reads may be insufficient to measure all rearrangements in all genomes in the tumor. Motivated by this application, we formulate the k-minimum completion problem (k-MCP). In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes. We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome. We also show that the k-MCP problem, for k ≥ 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome. These results lay the foundation for future algorithmic studies of the k-MCP and the application of these algorithms to real cancer sequencing data.


Subject(s)
Contig Mapping/methods , Gene Rearrangement , Genome, Human/genetics , Neoplasms/genetics , Algorithms , Base Sequence , Humans , Mutation , Sequence Analysis, DNA
2.
Bioinformatics ; 26(18): i446-52, 2010 Sep 15.
Article in English | MEDLINE | ID: mdl-20823306

ABSTRACT

MOTIVATION: Segmental duplications > 1 kb in length with >or= 90% sequence identity between copies comprise nearly 5% of the human genome. They are frequently found in large, contiguous regions known as duplication blocks that can contain mosaic patterns of thousands of segmental duplications. Reconstructing the evolutionary history of these complex genomic regions is a non-trivial, but important task. RESULTS: We introduce parsimony and likelihood techniques to analyze the evolutionary relationships between duplication blocks. Both techniques rely on a generic model of duplication in which long, contiguous substrings are copied and reinserted over large physical distances, allowing for a duplication block to be constructed by aggregating substrings of other blocks. For the likelihood method, we give an efficient dynamic programming algorithm to compute the weighted ensemble of all duplication scenarios that account for the construction of a duplication block. Using this ensemble, we derive the probabilities of various duplication scenarios. We formalize the task of reconstructing the evolutionary history of segmental duplications as an optimization problem on the space of directed acyclic graphs. We use a simulated annealing heuristic to solve the problem for a set of segmental duplications in the human genome in both parsimony and likelihood settings. AVAILABILITY: Supplementary information is available at http://www.cs.brown.edu/people/braphael/supplements/.


Subject(s)
Evolution, Molecular , Segmental Duplications, Genomic , Algorithms , Genome, Human , Humans , Models, Genetic , Models, Statistical , Probability
3.
Algorithms Mol Biol ; 5(1): 11, 2010 Jan 04.
Article in English | MEDLINE | ID: mdl-20047668

ABSTRACT

BACKGROUND: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. RESULTS: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide a description of a sequence of duplication events as a context-free grammar (CFG). CONCLUSION: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.

4.
Pac Symp Biocomput ; : 126-37, 2009.
Article in English | MEDLINE | ID: mdl-19213134

ABSTRACT

Segmental duplications are abundant in the human genome, but their evolutionary history is not well-understood. The mystery surrounding them is due in part to their complex organization; many segmental duplications are mosaic patterns of smaller repeated segments, or duplicons. A two-step model of duplication has been proposed to explain these mosaic patterns. In this model, duplicons are copied and aggregated into primary duplication blocks that subsequently seed secondary duplications. Here, we formalize the problem of computing a duplication scenario that is consistent with the two-step model. We first describe a dynamic programming algorithm to compute the duplication distance between two strings. We then use this distance as the cost function in an integer linear program to obtain the most parsimonious duplication scenario. We apply our method to derive putative ancestral relationships between segmental duplications in the human genome.


Subject(s)
Gene Duplication , Models, Genetic , Algorithms , Biological Evolution , Biometry , Genome, Human , Humans
5.
Bioinformatics ; 24(16): i133-8, 2008 Aug 15.
Article in English | MEDLINE | ID: mdl-18689814

ABSTRACT

MOTIVATION: Segmental duplications are common in mammalian genomes, but their evolutionary origins remain mysterious. A major difficulty in analyzing segmental duplications is that many duplications are complex mosaics of fragments of numerous other segmental duplications. RESULTS: We introduce a novel measure called duplication distance that describes the minimum number of duplications necessary to create a target string by repeated insertions of fragments of a source string. We derive an efficient algorithm to compute duplication distance, and we use the algorithm to analyze segmental duplications in the human genome. Our analysis reveals possible ancestral relationships between segmental duplications including numerous examples of duplications that contain multiple, nested insertions of fragments from one or more other duplications. Using duplication distance, we also identify a small number of segmental duplications that appear to have seeded many other duplications in the genome, lending support to a two-step model of segmental duplication in the genome. AVAILABILITY: Software for computing duplication distance is available upon request.


Subject(s)
Chromosome Mapping/methods , Genetic Linkage/genetics , Genome, Human/genetics , Sequence Analysis, DNA/methods , Base Sequence , Gene Duplication , Humans , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...