Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
2.
Bioinformatics ; 39(3)2023 03 01.
Article in English | MEDLINE | ID: mdl-36847450

ABSTRACT

SUMMARY: Leveraging local ancestry and haplotype information in genome-wide association studies and downstream analyses can improve the utility of genomics for individuals from diverse and recently admixed ancestries. However, most existing simulation, visualization and variant analysis frameworks are based on variant-level analysis and do not automatically handle these features. We present haptools, an open-source toolkit for performing local ancestry aware and haplotype-based analysis of complex traits. Haptools supports fast simulation of admixed genomes, visualization of admixture tracks, simulation of haplotype- and local ancestry-specific phenotype effects and a variety of file operations and statistics computed in a haplotype-aware manner. AVAILABILITY AND IMPLEMENTATION: Haptools is freely available at https://github.com/cast-genomics/haptools. DOCUMENTATION: Detailed documentation is available at https://haptools.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Software , Haplotypes , Genomics , Genome
3.
Nat Mach Intell ; 3(2): 172-180, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33796819

ABSTRACT

Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6-12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution, and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of TF binding and is likely to inform future deep learning applications to interpret non-coding genetic variants.

4.
BMC Bioinformatics ; 22(1): 201, 2021 Apr 20.
Article in English | MEDLINE | ID: mdl-33879052

ABSTRACT

BACKGROUND: A major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq. RESULTS: We present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips . CONCLUSIONS: ChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Software , Computer Simulation , Genome , High-Throughput Nucleotide Sequencing , Models, Statistical , Sequence Analysis, DNA
5.
Nature ; 589(7841): 246-250, 2021 01.
Article in English | MEDLINE | ID: mdl-33442040

ABSTRACT

Autism spectrum disorder (ASD) is an early-onset developmental disorder characterized by deficits in communication and social interaction and restrictive or repetitive behaviours1,2. Family studies demonstrate that ASD has a substantial genetic basis with contributions both from inherited and de novo variants3,4. It has been estimated that de novo mutations may contribute to 30% of all simplex cases, in which only a single child is affected per family5. Tandem repeats (TRs), defined here as sequences of 1 to 20 base pairs in size repeated consecutively, comprise one of the major sources of de novo mutations in humans6. TR expansions are implicated in dozens of neurological and psychiatric disorders7. Yet, de novo TR mutations have not been characterized on a genome-wide scale, and their contribution to ASD remains unexplored. Here we develop new bioinformatics methods for identifying and prioritizing de novo TR mutations from sequencing data and perform a genome-wide characterization of de novo TR mutations in ASD-affected probands and unaffected siblings. We infer specific mutation events and their precise changes in repeat number, and primarily focus on more prevalent stepwise copy number changes rather than large expansions. Our results demonstrate a significant genome-wide excess of TR mutations in ASD probands. Mutations in probands tend to be larger, enriched in fetal brain regulatory regions, and are predicted to be more evolutionarily deleterious. Overall, our results highlight the importance of considering repeat variants in future studies of de novo mutations.


Subject(s)
Autism Spectrum Disorder/genetics , DNA Repeat Expansion/genetics , Genetic Predisposition to Disease , Adolescent , Adult , Autism Spectrum Disorder/pathology , Brain/metabolism , Child , DNA Copy Number Variations/genetics , Female , Fetus/metabolism , Germ-Line Mutation/genetics , Humans , Least-Squares Analysis , Male , Middle Aged , Paternal Age , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...