Search | VHL Regional Portal

The Clark phaseable sample size problem: long-range phasing and loss of heterozygosity in GWAS.

Halldórsson, Bjarni V; Aguiar, Derek; Tarpine, Ryan; Istrail, Sorin.

J Comput Biol ; 18(3): 323-33, 2011 Mar.

Article in English | MEDLINE | ID: mdl-21385037

ABSTRACT

A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years, it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of this article is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent. These tracts can be used as input for a Clark-like phasing method to obtain a phasing solution of the sample. We show on simulated data that the algorithm will get an almost perfect solution if the number of individuals being genotyped is large enough and the correctness of the algorithm grows with the number of individuals being genotyped. We also study a related problem that connects copy number variation with phasing algorithm success. A loss of heterozygosity (LOH) event is when, by the laws of Mendelian inheritance, an individual should be heterozygote but, due to a deletion polymorphism, is not. Such polymorphisms are difficult to detect using existing algorithms, but play an important role in the genetics of disease and will confuse haplotype phasing algorithms if not accounted for. We will present an algorithm for detecting LOH regions across the genomes of thousands of individuals. The design of the long-range phasing algorithm and the loss of heterozygosity inference algorithms was inspired by our analysis of the Multiple Sclerosis (MS) GWAS dataset of the International Multiple Sclerosis Genetics Consortium. We present similar results to those obtained from the MS data.

Subject(s)

Genome-Wide Association Study/methods , Genomics/methods , Loss of Heterozygosity , Algorithms , Computer Simulation , Genotype , Haplotypes , Humans , Models, Genetic , Sample Size

Practical computational methods for regulatory genomics: a cisGRN-Lexicon and cisGRN-browser for gene regulatory networks.

Istrail, Sorin; Tarpine, Ryan; Schutter, Kyle; Aguiar, Derek.

Methods Mol Biol ; 674: 369-99, 2010.

Article in English | MEDLINE | ID: mdl-20827603

ABSTRACT

The CYRENE Project focuses on the study of cis-regulatory genomics and gene regulatory networks (GRN) and has three components: a cisGRN-Lexicon, a cisGRN-Browser, and the Virtual Sea Urchin software system. The project has been done in collaboration with Eric Davidson and is deeply inspired by his experimental work in genomic regulatory systems and gene regulatory networks. The current CYRENE cisGRN-Lexicon contains the regulatory architecture of 200 transcription factors encoding genes and 100 other regulatory genes in eight species: human, mouse, fruit fly, sea urchin, nematode, rat, chicken, and zebrafish, with higher priority on the first five species. The only regulatory genes included in the cisGRN-Lexicon (CYRENE genes) are those whose regulatory architecture is validated by what we call the Davidson Criterion: they contain functionally authenticated sites by site-specific mutagenesis, conducted in vivo, and followed by gene transfer and functional test. This is recognized as the most stringent experimental validation criterion to date for such a genomic regulatory architecture. The CYRENE cisGRN-Browser is a full genome browser tailored for cis-regulatory annotation and investigation. It began as a branch of the Celera Genome Browser (available as open source at http://sourceforge.net/projects/celeragb /) and has been transformed to a genome browser fully devoted to regulatory genomics. Its access paradigm for genomic data is zoom-to-the-DNA-base in real time. A more recent component of the CYRENE project is the Virtual Sea Urchin system (VSU), an interactive visualization tool that provides a four-dimensional (spatial and temporal) map of the gene regulatory networks of the sea urchin embryo.

Subject(s)

Gene Regulatory Networks , Genomics/methods , Internet , Software , Animals , Humans , Mice , Rats , Regulatory Sequences, Nucleic Acid/genetics , Sea Urchins/genetics , Trans-Activators/metabolism , User-Computer Interface

The imperfect ancestral recombination graph reconstruction problem: upper bounds for recombination and homoplasy.

Lam, Fumei; Tarpine, Ryan; Istrail, Sorin.

J Comput Biol ; 17(6): 767-81, 2010 Jun.

Article in English | MEDLINE | ID: mdl-20583925

ABSTRACT

One of the central problems in computational biology is the reconstruction of evolutionary histories. While models incorporating recombination and homoplasy have been studied separately, a missing component in the theory is a robust and flexible unifying model which incorporates both of these major biological events shaping genetic diversity. In this article, we introduce the first such unifying model and develop algorithms to find the optimal ancestral recombination graph incorporating recombinations and homoplasy events. The power of our framework is the connection between our formulation and the Directed Steiner Arborescence Problem in combinatorial optimization. We implement linear programming techniques as well as heuristics for the Directed Steiner Arborescence Problem, and use our methods to construct evolutionary histories for both simulated and real data sets.

Subject(s)

Computational Biology/methods , Models, Genetic , Phylogeny , Recombination, Genetic , Algorithms , Animals , Drosophila/genetics , Evolution, Molecular

Functional cis-regulatory genomics for systems biology.

Nam, Jongmin; Dong, Ping; Tarpine, Ryan; Istrail, Sorin; Davidson, Eric H.

Proc Natl Acad Sci U S A ; 107(8): 3930-5, 2010 Feb 23.

Article in English | MEDLINE | ID: mdl-20142491

ABSTRACT

Gene expression is controlled by interactions between trans-regulatory factors and cis-regulatory DNA sequences, and these interactions constitute the essential functional linkages of gene regulatory networks (GRNs). Validation of GRN models requires experimental cis-regulatory tests of predicted linkages to authenticate their identities and proposed functions. However, cis-regulatory analysis is, at present, at a severe bottleneck in genomic system biology because of the demanding experimental methodologies currently in use for discovering cis-regulatory modules (CRMs), in the genome, and for measuring their activities. Here we demonstrate a high-throughput approach to both discovery and quantitative characterization of CRMs. The unique aspect is use of DNA sequence tags to "barcode" CRM expression constructs, which can then be mixed, injected together into sea urchin eggs, and subsequently deconvolved. This method has increased the rate of cis-regulatory analysis by >100-fold compared with conventional one-by-one reporter assays. The utility of the DNA-tag reporters was demonstrated by the rapid discovery of 81 active CRMs from 37 previously unexplored sea urchin genes. We then obtained simultaneous high-resolution temporal characterization of the regulatory activities of more than 80 CRMs. On average 2-3 CRMs were discovered per gene. Comparison of endogenous gene expression profiles with those of the CRMs recovered from each gene showed that, for most cases, at least one CRM is active in each phase of endogenous expression, suggesting that CRM recovery was comprehensive. This approach will qualitatively alter the practice of GRN construction as well as validation, and will impact many additional areas of regulatory system biology.

Subject(s)

Gene Expression Regulation , Genomics/methods , High-Throughput Screening Assays , Systems Biology/methods , Animals , Gene Expression Profiling , Genes, Reporter , Genetic Complementation Test , Humans , Ovum , Sea Urchins

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL