Pesquisa | Portal Regional da BVS (teste)

1.

Deep statistical modelling of nanopore sequencing translocation times reveals latent non-B DNA structures.

Hosseini, Marjan; Palmer, Aaron; Manka, William; Grady, Patrick G S; Patchigolla, Venkata; Bi, Jinbo; O'Neill, Rachel J; Chi, Zhiyi; Aguiar, Derek.

Bioinformatics ; 39(39 Suppl 1): i242-i251, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37387144

RESUMO

MOTIVATION: Non-canonical (or non-B) DNA are genomic regions whose three-dimensional conformation deviates from the canonical double helix. Non-B DNA play an important role in basic cellular processes and are associated with genomic instability, gene regulation, and oncogenesis. Experimental methods are low-throughput and can detect only a limited set of non-B DNA structures, while computational methods rely on non-B DNA base motifs, which are necessary but not sufficient indicators of non-B structures. Oxford Nanopore sequencing is an efficient and low-cost platform, but it is currently unknown whether nanopore reads can be used for identifying non-B structures. RESULTS: We build the first computational pipeline to predict non-B DNA structures from nanopore sequencing. We formalize non-B detection as a novelty detection problem and develop the GoFAE-DND, an autoencoder that uses goodness-of-fit (GoF) tests as a regularizer. A discriminative loss encourages non-B DNA to be poorly reconstructed and optimizing Gaussian GoF tests allows for the computation of P-values that indicate non-B structures. Based on whole genome nanopore sequencing of NA12878, we show that there exist significant differences between the timing of DNA translocation for non-B DNA bases compared with B-DNA. We demonstrate the efficacy of our approach through comparisons with novelty detection methods using experimental data and data synthesized from a new translocation time simulator. Experimental validations suggest that reliable detection of non-B DNA from nanopore sequencing is achievable. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/bayesomicslab/ONT-nonb-GoFAE-DND.

Assuntos

Sequenciamento por Nanoporos , Humanos , DNA , Carcinogênese , Transformação Celular Neoplásica , Genômica

2.

Evaluating molecular fingerprint-based models of drug side effects against a statistical control.

Alpay, Berk A; Gosink, Mark; Aguiar, Derek.

Drug Discov Today ; 27(11): 103364, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36115633

RESUMO

There are many machine learning models that use molecular fingerprints of drugs to predict side effects. Characterizing their skill is necessary for understanding their usefulness in pharmaceutical development. Here, we analyze a statistical control of side effect prediction skill, develop a pipeline for benchmarking models, and evaluate how well existing models predict side effects identified in pharmaceutical documentation. We demonstrate that molecular fingerprints are useful for ranking drugs by their likelihood to cause a given side effect. However, the predictions for one or more drugs overall benefit only marginally from molecular fingerprints when ranking the likelihoods of many possible side effects, and display at most modest overall skill at identifying the side effects that do and do not occur.

3.

AtheroSpectrum Reveals Novel Macrophage Foam Cell Gene Signatures Associated With Atherosclerotic Cardiovascular Disease Risk.

Li, Chuan; Qu, Lili; Matz, Alyssa J; Murphy, Patrick A; Liu, Yongmei; Manichaikul, Ani W; Aguiar, Derek; Rich, Stephen S; Herrington, David M; Vu, David; Johnson, W Craig; Rotter, Jerome I; Post, Wendy S; Vella, Anthony T; Rodriguez-Oquendo, Annabelle; Zhou, Beiyan.

Circulation ; 145(3): 206-218, 2022 01 18.

Artigo em Inglês | MEDLINE | ID: mdl-34913723

RESUMO

BACKGROUND: Whereas several interventions can effectively lower lipid levels in people at risk for atherosclerotic cardiovascular disease (ASCVD), cardiovascular event risks remain, suggesting an unmet medical need to identify factors contributing to cardiovascular event risk. Monocytes and macrophages play central roles in atherosclerosis, but studies have yet to provide a detailed view of macrophage populations involved in increased ASCVD risk. METHODS: A novel macrophage foaming analytics tool, AtheroSpectrum, was developed using 2 quantitative indices depicting lipid metabolism and the inflammatory status of macrophages. A machine learning algorithm was developed to analyze gene expression patterns in the peripheral monocyte transcriptome of MESA participants (Multi-Ethnic Study of Atherosclerosis; set 1; n=911). A list of 30 genes was generated and integrated with traditional risk factors to create an ASCVD risk prediction model (30-gene cardiovascular disease risk score [CR-30]), which was subsequently validated in the remaining MESA participants (set 2; n=228); performance of CR-30 was also tested in 2 independent human atherosclerotic tissue transcriptome data sets (GTEx [Genotype-Tissue Expression] and GSE43292). RESULTS: Using single-cell transcriptomic profiles (GSE97310, GSE116240, GSE97941, and FR-FCM-Z23S), AtheroSpectrum detected 2 distinct programs in plaque macrophages-homeostatic foaming and inflammatory pathogenic foaming-the latter of which was positively associated with severity of atherosclerosis in multiple studies. A pool of 2209 pathogenic foaming genes was extracted and screened to select a subset of 30 genes correlated with cardiovascular event in MESA set 1. A cardiovascular disease risk score model (CR-30) was then developed by incorporating this gene set with traditional variables sensitive to cardiovascular event in MESA set 1 after cross-validation generalizability analysis. The performance of CR-30 was then tested in MESA set 2 (P=2.60×10-4; area under the receiver operating characteristic curve, 0.742) and 2 independent data sets (GTEx: P=7.32×10-17; area under the receiver operating characteristic curve, 0.664; GSE43292: P=7.04×10-2; area under the receiver operating characteristic curve, 0.633). Model sensitivity tests confirmed the contribution of the 30-gene panel to the prediction model (likelihood ratio test; df=31, P=0.03). CONCLUSIONS: Our novel computational program (AtheroSpectrum) identified a specific gene expression profile associated with inflammatory macrophage foam cells. A subset of 30 genes expressed in circulating monocytes jointly contributed to prediction of symptomatic atherosclerotic vascular disease. Incorporating a pathogenic foaming gene set with known risk factors can significantly strengthen the power to predict ASCVD risk. Our programs may facilitate both mechanistic investigations and development of therapeutic and prognostic strategies for ASCVD risk.

Assuntos

Aterosclerose/terapia , Doenças Cardiovasculares/terapia , Células Espumosas/citologia , Macrófagos/citologia , Idoso , Idoso de 80 Anos ou mais , Aterosclerose/etiologia , Aterosclerose/genética , Doenças Cardiovasculares/complicações , Doença da Artéria Coronariana/complicações , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/terapia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Placa Aterosclerótica/complicações , Placa Aterosclerótica/genética , Placa Aterosclerótica/terapia , Curva ROC , Risco , Calcificação Vascular/complicações , Calcificação Vascular/genética , Calcificação Vascular/terapia

4.

Combinatorial and statistical prediction of gene expression from haplotype sequence.

Alpay, Berk A; Demetci, Pinar; Istrail, Sorin; Aguiar, Derek.

Bioinformatics ; 36(Suppl_1): i194-i202, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-32657373

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) have discovered thousands of significant genetic effects on disease phenotypes. By considering gene expression as the intermediary between genotype and disease phenotype, expression quantitative trait loci studies have interpreted many of these variants by their regulatory effects on gene expression. However, there remains a considerable gap between genotype-to-gene expression association and genotype-to-gene expression prediction. Accurate prediction of gene expression enables gene-based association studies to be performed post hoc for existing GWAS, reduces multiple testing burden, and can prioritize genes for subsequent experimental investigation. RESULTS: In this work, we develop gene expression prediction methods that relax the independence and additivity assumptions between genetic markers. First, we consider gene expression prediction from a regression perspective and develop the HAPLEXR algorithm which combines haplotype clusterings with allelic dosages. Second, we introduce the new gene expression classification problem, which focuses on identifying expression groups rather than continuous measurements; we formalize the selection of an appropriate number of expression groups using the principle of maximum entropy. Third, we develop the HAPLEXD algorithm that models haplotype sharing with a modified suffix tree data structure and computes expression groups by spectral clustering. In both models, we penalize model complexity by prioritizing genetic clusters that indicate significant effects on expression. We compare HAPLEXR and HAPLEXD with three state-of-the-art expression prediction methods and two novel logistic regression approaches across five GTEx v8 tissues. HAPLEXD exhibits significantly higher classification accuracy overall; HAPLEXR shows higher prediction accuracy on approximately half of the genes tested and the largest number of best predicted genes (r2>0.1) among all methods. We show that variant and haplotype features selected by HAPLEXR are smaller in size than competing methods (and thus more interpretable) and are significantly enriched in functional annotations related to gene regulation. These results demonstrate the importance of explicitly modeling non-dosage dependent and intragenic epistatic effects when predicting expression. AVAILABILITY AND IMPLEMENTATION: Source code and binaries are freely available at https://github.com/rapturous/HAPLEX. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Expressão Gênica , Haplótipos , Fenótipo , Locos de Características Quantitativas

5.

Bayesian nonparametric discovery of isoforms and individual specific quantification.

Aguiar, Derek; Cheng, Li-Fang; Dumitrascu, Bianca; Mordelet, Fantine; Pai, Athma A; Engelhardt, Barbara E.

Nat Commun ; 9(1): 1681, 2018 04 27.

Artigo em Inglês | MEDLINE | ID: mdl-29703885

RESUMO

Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop BIISQ, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. BIISQ does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. BIISQ shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.

Assuntos

Processamento Alternativo/genética , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Teorema de Bayes , Simulação por Computador , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Humanos , Isoformas de Proteínas/genética , Software , Estatísticas não Paramétricas

6.

Transcriptome of American oysters, Crassostrea virginica, in response to bacterial challenge: insights into potential mechanisms of disease resistance.

McDowell, Ian C; Nikapitiya, Chamilani; Aguiar, Derek; Lane, Christopher E; Istrail, Sorin; Gomez-Chiarri, Marta.

PLoS One ; 9(8): e105097, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25122115

RESUMO

The American oyster Crassostrea virginica, an ecologically and economically important estuarine organism, can suffer high mortalities in areas in the Northeast United States due to Roseovarius Oyster Disease (ROD), caused by the gram-negative bacterial pathogen Roseovarius crassostreae. The goals of this research were to provide insights into: 1) the responses of American oysters to R. crassostreae, and 2) potential mechanisms of resistance or susceptibility to ROD. The responses of oysters to bacterial challenge were characterized by exposing oysters from ROD-resistant and susceptible families to R. crassostreae, followed by high-throughput sequencing of cDNA samples from various timepoints after disease challenge. Sequence data was assembled into a reference transcriptome and analyzed through differential gene expression and functional enrichment to uncover genes and processes potentially involved in responses to ROD in the American oyster. While susceptible oysters experienced constant levels of mortality when challenged with R. crassostreae, resistant oysters showed levels of mortality similar to non-challenged oysters. Oysters exposed to R. crassostreae showed differential expression of transcripts involved in immune recognition, signaling, protease inhibition, detoxification, and apoptosis. Transcripts involved in metabolism were enriched in susceptible oysters, suggesting that bacterial infection places a large metabolic demand on these oysters. Transcripts differentially expressed in resistant oysters in response to infection included the immune modulators IL-17 and arginase, as well as several genes involved in extracellular matrix remodeling. The identification of potential genes and processes responsible for defense against R. crassostreae in the American oyster provides insights into potential mechanisms of disease resistance.

Assuntos

Ostreidae/genética , Rhodobacteraceae/patogenicidade , Transcriptoma , Animais , Regulação da Expressão Gênica , Ostreidae/microbiologia

7.

Tumor haplotype assembly algorithms for cancer genomics.

Aguiar, Derek; Wong, Wendy S W; Istrail, Sorin.

Pac Symp Biocomput ; : 3-14, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24297529

RESUMO

The growing availability of inexpensive high-throughput sequence data is enabling researchers to sequence tumor populations within a single individual at high coverage. But, cancer genome sequence evolution and mutational phenomena like driver mutations and gene fusions are difficult to investigate without first reconstructing tumor haplotype sequences. Haplotype assembly of single individual tumor populations is an exceedingly difficult task complicated by tumor haplotype heterogeneity, tumor or normal cell sequence contamination, polyploidy, and complex patterns of variation. While computational and experimental haplotype phasing of diploid genomes has seen much progress in recent years, haplotype assembly in cancer genomes remains uncharted territory. In this work, we describe HapCompass-Tumor a computational modeling and algorithmic framework for haplotype assembly of copy number variable cancer genomes containing haplotypes at different frequencies and complex variation. We extend our polyploid haplotype assembly model and present novel algorithms for (1) complex variations, including copy number changes, as varying numbers of disjoint paths in an associated graph, (2) variable haplotype frequencies and contamination, and (3) computation of tumor haplotypes using simple cycles of the compass graph which constrain the space of haplotype assembly solutions. The model and algorithm are implemented in the software package HapCompass-Tumor which is available for download from http://www.brown.edu/Research/Istrail_Lab/.

Assuntos

Algoritmos , Haplótipos , Neoplasias/genética , Biologia Computacional , Variações do Número de Cópias de DNA , Genoma Humano , Genômica/estatística & dados numéricos , Humanos , Modelos Genéticos , Poliploidia , Translocação Genética

8.

Haplotype assembly in polyploid genomes and identical by descent shared tracts.

Aguiar, Derek; Istrail, Sorin.

Bioinformatics ; 29(13): i352-60, 2013 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-23813004

RESUMO

MOTIVATION: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction. RESULTS: In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory-based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data. AVAILABILITY AND IMPLEMENTATION: HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genoma Humano , Haplótipos , Poliploidia , Análise de Sequência de DNA/métodos , Algoritmos , Genômica/métodos , Humanos

9.

A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems.

Tulin, Sarah; Aguiar, Derek; Istrail, Sorin; Smith, Joel.

Evodevo ; 4: 16, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23731568

RESUMO

BACKGROUND: The de novo assembly of transcriptomes from short shotgun sequences raises challenges due to random and non-random sequencing biases and inherent transcript complexity. We sought to define a pipeline for de novo transcriptome assembly to aid researchers working with emerging model systems where well annotated genome assemblies are not available as a reference. To detail this experimental and computational method, we used early embryos of the sea anemone, Nematostella vectensis, an emerging model system for studies of animal body plan evolution. We performed RNA-seq on embryos up to 24 h of development using Illumina HiSeq technology and evaluated independent de novo assembly methods. The resulting reads were assembled using either the Trinity assembler on all quality controlled reads or both the Velvet and Oases assemblers on reads passing a stringent digital normalization filter. A control set of mRNA standards from the National Institute of Standards and Technology (NIST) was included in our experimental pipeline to invest our transcriptome with quantitative information on absolute transcript levels and to provide additional quality control. RESULTS: We generated >200 million paired-end reads from directional cDNA libraries representing well over 20 Gb of sequence. The Trinity assembler pipeline, including preliminary quality control steps, resulted in more than 86% of reads aligning with the reference transcriptome thus generated. Nevertheless, digital normalization combined with assembly by Velvet and Oases required far less computing power and decreased processing time while still mapping 82% of reads. We have made the raw sequencing reads and assembled transcriptome publically available. CONCLUSIONS: Nematostella vectensis was chosen for its strategic position in the tree of life for studies into the origins of the animal body plan, however, the challenge of reference-free transcriptome assembly is relevant to all systems for which well annotated gene models and independently verified genome assembly may not be available. To navigate this new territory, we have constructed a pipeline for library preparation and computational analysis for de novo transcriptome assembly. The gene models defined by this reference transcriptome define the set of genes transcribed in early Nematostella development and will provide a valuable dataset for further gene regulatory network investigations.

10.

DELISHUS: an efficient and exact algorithm for genome-wide detection of deletion polymorphism in autism.

Aguiar, Derek; Halldórsson, Bjarni V; Morrow, Eric M; Istrail, Sorin.

Bioinformatics ; 28(12): i154-62, 2012 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-22689755

RESUMO

MOTIVATION: The understanding of the genetic determinants of complex disease is undergoing a paradigm shift. Genetic heterogeneity of rare mutations with deleterious effects is more commonly being viewed as a major component of disease. Autism is an excellent example where research is active in identifying matches between the phenotypic and genomic heterogeneities. A considerable portion of autism appears to be correlated with copy number variation, which is not directly probed by single nucleotide polymorphism (SNP) array or sequencing technologies. Identifying the genetic heterogeneity of small deletions remains a major unresolved computational problem partly due to the inability of algorithms to detect them. RESULTS: In this article, we present an algorithmic framework, which we term DELISHUS, that implements three exact algorithms for inferring regions of hemizygosity containing genomic deletions of all sizes and frequencies in SNP genotype data. We implement an efficient backtracking algorithm-that processes a 1 billion entry genome-wide association study SNP matrix in a few minutes-to compute all inherited deletions in a dataset. We further extend our model to give an efficient algorithm for detecting de novo deletions. Finally, given a set of called deletions, we also give a polynomial time algorithm for computing the critical regions of recurrent deletions. DELISHUS achieves significantly lower false-positive rates and higher power than previously published algorithms partly because it considers all individuals in the sample simultaneously. DELISHUS may be applied to SNP array or sequencing data to identify the deletion spectrum for family-based association studies. AVAILABILITY: DELISHUS is available at http://www.brown.edu/Research/Istrail_Lab/.

Assuntos

Algoritmos , Transtorno Autístico/genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Genótipo , Humanos , Padrões de Herança , Fenótipo , Deleção de Sequência

11.

HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data.

Aguiar, Derek; Istrail, Sorin.

J Comput Biol ; 19(6): 577-90, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-22697235

RESUMO

Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current computational methods of determining haplotype phase from sequence data--known as haplotype assembly--have difficulties producing accurate results for large (1000 genomes-type) data or operate on restricted optimizations that are unrealistic considering modern high-throughput sequencing technologies. We present a novel algorithm, HapCompass, for haplotype assembly of densely sequenced human genome data. The HapCompass algorithm operates on a graph where single nucleotide polymorphisms (SNPs) are nodes and edges are defined by sequence reads and viewed as supporting evidence of co-occurring SNP alleles in a haplotype. In our graph model, haplotype phasings correspond to spanning trees. We define the minimum weighted edge removal optimization on this graph and develop an algorithm based on cycle basis local optimizations for resolving conflicting evidence. We then estimate the amount of sequencing required to produce a complete haplotype assembly of a chromosome. Using these estimates together with metrics borrowed from genome assembly and haplotype phasing, we compare the accuracy of HapCompass, the Genome Analysis ToolKit, and HapCut for 1000 Genomes Project and simulated data. We show that HapCompass performs significantly better for a variety of data and metrics. HapCompass is freely available for download (www.brown.edu/Research/Istrail_Lab/).

Assuntos

Algoritmos , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Genoma Humano , Análise de Sequência de DNA/métodos , Alelos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único

12.

The Clark phaseable sample size problem: long-range phasing and loss of heterozygosity in GWAS.

Halldórsson, Bjarni V; Aguiar, Derek; Tarpine, Ryan; Istrail, Sorin.

J Comput Biol ; 18(3): 323-33, 2011 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-21385037

RESUMO

A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years, it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of this article is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent. These tracts can be used as input for a Clark-like phasing method to obtain a phasing solution of the sample. We show on simulated data that the algorithm will get an almost perfect solution if the number of individuals being genotyped is large enough and the correctness of the algorithm grows with the number of individuals being genotyped. We also study a related problem that connects copy number variation with phasing algorithm success. A loss of heterozygosity (LOH) event is when, by the laws of Mendelian inheritance, an individual should be heterozygote but, due to a deletion polymorphism, is not. Such polymorphisms are difficult to detect using existing algorithms, but play an important role in the genetics of disease and will confuse haplotype phasing algorithms if not accounted for. We will present an algorithm for detecting LOH regions across the genomes of thousands of individuals. The design of the long-range phasing algorithm and the loss of heterozygosity inference algorithms was inspired by our analysis of the Multiple Sclerosis (MS) GWAS dataset of the International Multiple Sclerosis Genetics Consortium. We present similar results to those obtained from the MS data.

Assuntos

Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Perda de Heterozigosidade , Algoritmos , Simulação por Computador , Genótipo , Haplótipos , Humanos , Modelos Genéticos , Tamanho da Amostra

13.

Haplotype phasing by multi-assembly of shared haplotypes: phase-dependent interactions between rare variants.

Halldórsson, Bjarni V; Aguiar, Derek; Istrail, Sorin.

Pac Symp Biocomput ; : 88-99, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21121036

RESUMO

In this paper we propose algorithmic strategies, Lander-Waterman-like statistical estimates, and genome-wide software for haplotype phasing by multi-assembly of shared haplotypes. Specifically, we consider four types of results which together provide a comprehensive workflow of GWAS data sets: (1) statistics of multi-assembly of shared haplotypes (2) graph theoretic algorithms for haplotype assembly based on conflict graphs of sequencing reads (3) inference of pedigree structure through haplotype sharing via tract finding algorithms and (4) multi-assembly of shared haplotypes of cases, controls, and trios. The input for the workflows that we consider are any of the combination of: (A) genotype data (B) next generation sequencing (NGS) (C) pedigree information. (1) We present Lander-Waterman-like statistics for NGS projects for the multi-assembly of shared haplotypes. Results are presented in Sec. 2. (2) In Sec. 3, we present algorithmic strategies for haplotype assembly using NGS, NGS + genotype data, and NGS + pedigree information. (3) This work builds on algorithms presented in Halldórsson et al. and are part of the same library of tools co-developed for GWAS workflows. (4) Section 3.3.1 contains algorithmic strategies for multi-assembly of GWAS data. We present algorithms for assembling large data sets and for determining and using shared haplotypes to more reliably assemble and phase the data. Workflows 1-4 provide a set of rigorous algorithms which have the potential to identify phase-dependent interactions between rare variants in linkage equilibrium which are associated with cases. They build on our extensive work on haplotype phasing, haplotype assembly, and whole genome assembly comparison.

Assuntos

Variação Genética , Haplótipos , Algoritmos , Biologia Computacional , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Polimorfismo de Nucleotídeo Único , Software

14.

Practical computational methods for regulatory genomics: a cisGRN-Lexicon and cisGRN-browser for gene regulatory networks.

Istrail, Sorin; Tarpine, Ryan; Schutter, Kyle; Aguiar, Derek.

Methods Mol Biol ; 674: 369-99, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20827603

RESUMO

The CYRENE Project focuses on the study of cis-regulatory genomics and gene regulatory networks (GRN) and has three components: a cisGRN-Lexicon, a cisGRN-Browser, and the Virtual Sea Urchin software system. The project has been done in collaboration with Eric Davidson and is deeply inspired by his experimental work in genomic regulatory systems and gene regulatory networks. The current CYRENE cisGRN-Lexicon contains the regulatory architecture of 200 transcription factors encoding genes and 100 other regulatory genes in eight species: human, mouse, fruit fly, sea urchin, nematode, rat, chicken, and zebrafish, with higher priority on the first five species. The only regulatory genes included in the cisGRN-Lexicon (CYRENE genes) are those whose regulatory architecture is validated by what we call the Davidson Criterion: they contain functionally authenticated sites by site-specific mutagenesis, conducted in vivo, and followed by gene transfer and functional test. This is recognized as the most stringent experimental validation criterion to date for such a genomic regulatory architecture. The CYRENE cisGRN-Browser is a full genome browser tailored for cis-regulatory annotation and investigation. It began as a branch of the Celera Genome Browser (available as open source at http://sourceforge.net/projects/celeragb /) and has been transformed to a genome browser fully devoted to regulatory genomics. Its access paradigm for genomic data is zoom-to-the-DNA-base in real time. A more recent component of the CYRENE project is the Virtual Sea Urchin system (VSU), an interactive visualization tool that provides a four-dimensional (spatial and temporal) map of the gene regulatory networks of the sea urchin embryo.

Assuntos

Redes Reguladoras de Genes , Genômica/métodos , Internet , Software , Animais , Humanos , Camundongos , Ratos , Sequências Reguladoras de Ácido Nucleico/genética , Ouriços-do-Mar/genética , Transativadores/metabolismo , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA